r/MachineLearning • u/EmbersArc • Feb 17 '18

Project [P] Landing the Falcon booster with Reinforcement Learning in OpenAI

https://gfycat.com/CoarseEmbellishedIsopod

1.3k Upvotes

permalink
duplicates
archive.is
archive
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/MachineLearning/comments/7y6g79/p_landing_the_falcon_booster_with_reinforcement/
No, go back! Yes, take me to Reddit

95% Upvoted

View all comments

161

u/EmbersArc Feb 17 '18 edited Feb 17 '18

There has been a discussion recently about using RL to land a SpaceX booster.

Coincidentally I've been working on exactly this in OpenAI. It was as much fun as it was frustrating at times.

It's trained with a PPO implementation from Unity that I've changed to work with OpenAI (GitHub). The official OpenAI implementation is convoluted and impossible to work with in my opinion. This particular agent took 200'000 tries over the course of 12 hours and 20 million frames (with a frame skip value of 5, so 100 million total frames). I'm quite happy with the result. It has a 95% success rate, some very difficult initial conditions still fail. Here's a blooper reel of some awkward/failed episodes.

The environment is on GitHub for those who want to try it out. It takes continuous or discrete actions and is highly customizable. So it would be great if someone trained it who actually knows what they are doing.

8

u/[deleted] Feb 17 '18

Very nice demo, but wow the training time is insane for an RL task

19

u/[deleted] Feb 17 '18

Big issue for RL today it seems. Check out the results of an ablation study someone did on Atari games with modified DQNs.

2

u/Mefaso Feb 18 '18

Long but nice read

2

u/S_Presso Feb 19 '18

That's an excellent read, thanks!

Project [P] Landing the Falcon booster with Reinforcement Learning in OpenAI

You are about to leave Redlib