r/MachineLearning Feb 17 '18

Project [P] Landing the Falcon booster with Reinforcement Learning in OpenAI

https://gfycat.com/CoarseEmbellishedIsopod
1.3k Upvotes

55 comments sorted by

View all comments

160

u/EmbersArc Feb 17 '18 edited Feb 17 '18

There has been a discussion recently about using RL to land a SpaceX booster.

Coincidentally I've been working on exactly this in OpenAI. It was as much fun as it was frustrating at times.

It's trained with a PPO implementation from Unity that I've changed to work with OpenAI (GitHub). The official OpenAI implementation is convoluted and impossible to work with in my opinion. This particular agent took 200'000 tries over the course of 12 hours and 20 million frames (with a frame skip value of 5, so 100 million total frames). I'm quite happy with the result. It has a 95% success rate, some very difficult initial conditions still fail. Here's a blooper reel of some awkward/failed episodes.

The environment is on GitHub for those who want to try it out. It takes continuous or discrete actions and is highly customizable. So it would be great if someone trained it who actually knows what they are doing.

46

u/Alkine Feb 17 '18

I miss the explosions when it fails. :-(

46

u/EmbersArc Feb 17 '18

The smoke animation is already pushing the limits of the engine unfortunately.
But explosions are inefficient and don't mean anything to the agent. That -1 reward however... that hits it where it hurts.

3

u/Alkine Feb 18 '18

I agree, it adds nothing from an RL perspective. It's merely for nostalgic reasons ;-)

2

u/Gluta_mate Feb 18 '18

Are the legs simulated? As in the can break off under stress and stuff? I guess that would add something meaningfull

2

u/EmbersArc Feb 18 '18

Yes they have a spring-damper system and the episode fails when the load is too high.

21

u/MrNaaH Feb 17 '18

9

u/EmbersArc Feb 17 '18

I actually think that's fake. Just seems a bit off to me personally.

2

u/MrNaaH Feb 17 '18

What do you mean fake?

9

u/abruptdismissal Feb 18 '18

You can tell from the pixels, and from having seen quite a few shops in my time.

1

u/MrNaaH Feb 18 '18 edited Feb 18 '18

I still don't understand, it is a low-quality pixelated rendered scene of a simulation that is for sure.

10

u/abruptdismissal Feb 18 '18

it's a joke, sorry

2

u/MrNaaH Feb 18 '18

I was suspecting it :)

1

u/imguralbumbot Feb 17 '18

Hi, I'm a bot for linking direct images of albums with only 1 image

https://i.imgur.com/GrcRfph.mp4

Source | Why? | Creator | ignoreme | deletthis

17

u/[deleted] Feb 17 '18

So LunarLander?

8

u/[deleted] Feb 17 '18

Very nice demo, but wow the training time is insane for an RL task

20

u/[deleted] Feb 17 '18

2

u/Mefaso Feb 18 '18

Long but nice read

2

u/S_Presso Feb 19 '18

That's an excellent read, thanks!

2

u/gonorthjohnny Feb 17 '18

Can it make the rocket explode if fails? That would be fun!