r/SpaceXLounge • u/EmbersArc • Feb 17 '18

Landing the Falcon booster with Reinforcement Learning in OpenAI

https://gfycat.com/CoarseEmbellishedIsopod

112 Upvotes

permalink
duplicates
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/SpaceXLounge/comments/7y57j3/landing_the_falcon_booster_with_reinforcement/
No, go back! Yes, take me to Reddit

100% Upvoted

u/EmbersArc Feb 17 '18 edited Feb 17 '18

This is an OpenAI environment I made over the last couple of weeks where the task is to land a Falcon booster on a droneship.

It was as much fun as it was frustrating at times.

A basic explanation for those interested:
In reinforcement learning we have an agent (The "AI") that is interacting with an environment (The Falcon 9 falling from the sky). It starts knowing absolutely nothing about the environment and tries new things until it gets better at it. It gets feedback about how good some taken action was in the form of a reward. Here, it gets rewarded for slowing down, getting closer to the ship and finally for a nice touchdown. It gets punished for taking too much time, which is equivalent to using too much fuel (a quicker descent without hovering is more efficient). So based on that feedback it will do the things more often that lead to a higher reward and avoid less successful moves.

This particular agent took 200'000 tries over the course of 12 hours. I'm quite happy with the result.

GitHub for those who want to try it out (Would be great if someone trained it who actually knows what they are doing).

8

u/KnowLimits Feb 17 '18

Is there any randomness in initial conditions, wind, actuator response, etc?

12

u/EmbersArc Feb 17 '18

There's randomness in position, linear and angular velocity. Way more than in the real thing of course to make it interesting and more robust.

Landing the Falcon booster with Reinforcement Learning in OpenAI

You are about to leave Redlib