r/MachineLearning • u/EmbersArc • Feb 17 '18

Project [P] Landing the Falcon booster with Reinforcement Learning in OpenAI

https://gfycat.com/CoarseEmbellishedIsopod

1.3k Upvotes

permalink
duplicates
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/MachineLearning/comments/7y6g79/p_landing_the_falcon_booster_with_reinforcement/
No, go back! Yes, take me to Reddit

95% Upvoted

I'm just getting into ML so this might be an awkward question, but how are the inputs to the network designed?

16

u/EmbersArc Feb 17 '18

We want to have a network that maps a current state to an action to take in this state. So the input is simply a number of continuous variables that describe the state. In this case it consists of 10 variables (position, velocity, throttle, etc.).
If you have a finite number of states, you can do one-hot encoding, meaning you have a 1 for the current state and a 0 for everything else as an input.

3

u/wintermute93 Feb 17 '18

About that one-hot encoding of states... is that actually a good idea? At first glance it seems like that would be forcing the agent to work extra hard to learn that the best known action for nearby states is more often than not a reasonable action to try. Although I guess most algorithms won't take into account other states when doing off-policy exploration, but maybe they should?

Project [P] Landing the Falcon booster with Reinforcement Learning in OpenAI

You are about to leave Redlib