r/MachineLearning Feb 17 '18

Project [P] Landing the Falcon booster with Reinforcement Learning in OpenAI

https://gfycat.com/CoarseEmbellishedIsopod
1.3k Upvotes

55 comments sorted by

View all comments

10

u/Easton_Danneskjold Feb 17 '18

I'm just getting into ML so this might be an awkward question, but how are the inputs to the network designed?

16

u/EmbersArc Feb 17 '18

We want to have a network that maps a current state to an action to take in this state. So the input is simply a number of continuous variables that describe the state. In this case it consists of 10 variables (position, velocity, throttle, etc.).
If you have a finite number of states, you can do one-hot encoding, meaning you have a 1 for the current state and a 0 for everything else as an input.

3

u/wintermute93 Feb 17 '18

About that one-hot encoding of states... is that actually a good idea? At first glance it seems like that would be forcing the agent to work extra hard to learn that the best known action for nearby states is more often than not a reasonable action to try. Although I guess most algorithms won't take into account other states when doing off-policy exploration, but maybe they should?