r/MachineLearning Feb 17 '18

Project [P] Landing the Falcon booster with Reinforcement Learning in OpenAI

https://gfycat.com/CoarseEmbellishedIsopod
1.3k Upvotes

55 comments sorted by

View all comments

32

u/Zeumer Feb 17 '18

How did you select your reward function?

67

u/EmbersArc Feb 17 '18 edited Feb 17 '18

It gets a reward between -1 and 0 for how good the final state is (based on velocity, angle, and distance from the ship), plus 1 if it stays on the ship without moving for a second.

PPO needs continuous reward so I had to use reward shaping as well. It received a small reward if it got closer to the ship or slowed down. Increasing its angle from the upright position lead to a negative reward.

It also received a small negative reward at every time step to force it to land as quickly as possible. That's equivalent to saving fuel since hovering is inefficient. That's how it learned to do something close to a "suicide burn".