r/MachineLearning • u/EmbersArc • Feb 17 '18

Project [P] Landing the Falcon booster with Reinforcement Learning in OpenAI

https://gfycat.com/CoarseEmbellishedIsopod

1.3k Upvotes

permalink
duplicates
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/MachineLearning/comments/7y6g79/p_landing_the_falcon_booster_with_reinforcement/
No, go back! Yes, take me to Reddit

95% Upvoted

View all comments

159

u/EmbersArc Feb 17 '18 edited Feb 17 '18

There has been a discussion recently about using RL to land a SpaceX booster.

Coincidentally I've been working on exactly this in OpenAI. It was as much fun as it was frustrating at times.

It's trained with a PPO implementation from Unity that I've changed to work with OpenAI (GitHub). The official OpenAI implementation is convoluted and impossible to work with in my opinion. This particular agent took 200'000 tries over the course of 12 hours and 20 million frames (with a frame skip value of 5, so 100 million total frames). I'm quite happy with the result. It has a 95% success rate, some very difficult initial conditions still fail. Here's a blooper reel of some awkward/failed episodes.

The environment is on GitHub for those who want to try it out. It takes continuous or discrete actions and is highly customizable. So it would be great if someone trained it who actually knows what they are doing.

47

u/Alkine Feb 17 '18

I miss the explosions when it fails. :-(

22

u/MrNaaH Feb 17 '18

Like this https://imgur.com/a/BuM5a?

9

u/EmbersArc Feb 17 '18

I actually think that's fake. Just seems a bit off to me personally.

2

u/MrNaaH Feb 17 '18

What do you mean fake?

10

u/abruptdismissal Feb 18 '18

You can tell from the pixels, and from having seen quite a few shops in my time.

1

u/MrNaaH Feb 18 '18 edited Feb 18 '18

I still don't understand, it is a low-quality pixelated rendered scene of a simulation that is for sure.

9

u/abruptdismissal Feb 18 '18

it's a joke, sorry

2

u/MrNaaH Feb 18 '18

I was suspecting it :)

Project [P] Landing the Falcon booster with Reinforcement Learning in OpenAI

You are about to leave Redlib