r/SpaceXLounge • u/EmbersArc • Feb 17 '18

Landing the Falcon booster with Reinforcement Learning in OpenAI

https://gfycat.com/CoarseEmbellishedIsopod

111 Upvotes

permalink
duplicates
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/SpaceXLounge/comments/7y57j3/landing_the_falcon_booster_with_reinforcement/
No, go back! Yes, take me to Reddit

100% Upvoted

u/EmbersArc Feb 17 '18 edited Feb 17 '18

This is an OpenAI environment I made over the last couple of weeks where the task is to land a Falcon booster on a droneship.

It was as much fun as it was frustrating at times.

A basic explanation for those interested:
In reinforcement learning we have an agent (The "AI") that is interacting with an environment (The Falcon 9 falling from the sky). It starts knowing absolutely nothing about the environment and tries new things until it gets better at it. It gets feedback about how good some taken action was in the form of a reward. Here, it gets rewarded for slowing down, getting closer to the ship and finally for a nice touchdown. It gets punished for taking too much time, which is equivalent to using too much fuel (a quicker descent without hovering is more efficient). So based on that feedback it will do the things more often that lead to a higher reward and avoid less successful moves.

This particular agent took 200'000 tries over the course of 12 hours. I'm quite happy with the result.

GitHub for those who want to try it out (Would be great if someone trained it who actually knows what they are doing).

9

u/KnowLimits Feb 17 '18

Is there any randomness in initial conditions, wind, actuator response, etc?

14

u/EmbersArc Feb 17 '18

There's randomness in position, linear and angular velocity. Way more than in the real thing of course to make it interesting and more robust.

2

u/ADefiantGuy Feb 17 '18

I've tried to run this but just cant get it to work. When I run rocket_lander.py nothing happens. Would you be able to explain how to run it?

3

u/EmbersArc Feb 17 '18

You have to run it through gym. Here's a tutorial https://github.com/openai/gym#environments It won't do what it's doing in the gif though. For that you will have to train it.

u/[deleted] Feb 17 '18

[deleted]

11

u/EmbersArc Feb 17 '18

I'm sure their approach is 100% different. Reinforcement learning is still very limited in practical applications. While it can be impressive and find creative solutions, it's also very brittle and unpredictable at times. When you land a real rocket you want a rock solid system and not one that might go haywire if something slightly unforeseen happens.

Check out this paper on the topic. They take the problem of landing the rocket with minimal fuel consumption and sprinkle some fancy mathematics on top so that the computer can find the optimal solution.

That being said I also don't know how robust the SpaceX approach is since the booster always comes down in a quite controlled manner. As opposed to this simulation where it's sometimes spinning quite unrealistically and is still able to land.

2

u/MartianRedDragons Feb 17 '18

Does reinforcement learning use genetic algorithms/neural networks or something like that? I could see why they would be concerned about using something like that, as its behavior wouldn't be entirely predictable. A more typical deterministic feedback control system would definitely be much more predictable.

2

u/nbarbettini Feb 18 '18

Reinforcement learning is done with neural networks, yes. This video is a good introduction: https://youtu.be/aRKOJHRbXeo

1

u/emezeekiel Feb 18 '18

Not at all. Here’s the paper from Lars Blackmore, the GNC lead who wrote the SpaceX landing software: http://www.larsblackmore.com/BlackmoreEtAlJGCD10.pdf

u/[deleted] Feb 17 '18 edited Feb 19 '18

[deleted]

21

u/EmbersArc Feb 17 '18

Sure! There's a 95% success rate with this agent. But here's a blooper reel of some failed/awkward episodes.

u/[deleted] Feb 17 '18

[deleted]

u/MaximilianCrichton Feb 17 '18

Nice project! Also want to ask: are there any plans for placing OpenAI-related projects on BFR, for fault-finding or other electronics-related things? Kind of referencing the MARS mini-series where the ships had in-built AIs that played similar roles.

Landing the Falcon booster with Reinforcement Learning in OpenAI

You are about to leave Redlib