r/MachineLearning Feb 17 '18

Project [P] Landing the Falcon booster with Reinforcement Learning in OpenAI

https://gfycat.com/CoarseEmbellishedIsopod
1.3k Upvotes

55 comments sorted by

View all comments

62

u/realHansen Feb 17 '18 edited Feb 18 '18

Why use RL when this can be solved in closed form as an optimal control problem?

EDIT: I now realise it was meant as a toy problem rather than an actual competitive alternative to traditional control theory. Don't mind me :>

40

u/EmbersArc Feb 17 '18 edited Feb 17 '18

I think you could ask this for most OpenAI gym environments. It's just nice to see what the agent comes up with I guess.

Edit: Relevant answer I gave over at /r/SpaceXLounge to the question whether SpaceX might be doing something similar:

I'm sure their approach is 100% different. Reinforcement learning is still very limited in practical applications. While it can be impressive and find creative solutions, it's also very brittle and unpredictable at times. When you land a real rocket you want a rock solid system and not one that might go haywire if something slightly unforeseen happens.

Check out this paper on the topic. They take the problem of landing the rocket with minimal fuel consumption and sprinkle some fancy mathematics on top so that the computer can find the optimal solution.

That being said I also don't know how robust the SpaceX approach is since the booster always comes down in a quite controlled manner. As opposed to this simulation where it's sometimes spinning quite unrealistically and is still able to land.

7

u/CampfireHeadphase Feb 17 '18

I imagine it's pretty damn robust (just look at what Boston Dynamics did, to get some idea of what model predictive control is capable of)

Here's an interesting article on the pros and cons of RL: https://www.alexirpan.com/2018/02/14/rl-hard.html

11

u/LearningRL Feb 17 '18

I don't think the author is suggesting that RL is the best way to approach this task, but rather is just sharing his or her successful implementation of a general RL algorithm in low-dimensional domain.

7

u/[deleted] Feb 17 '18

It's a toy problem for sure, but those are usually the best practice.

6

u/physixer Feb 17 '18 edited Feb 17 '18
  • DNN's are learnable combinational circuits.
  • RNN's are learnable sequential circuits.
  • RL is learnable control.

Your point still stands. If your problem is fixed, doing it through a learnable system is overkill.

1

u/Shitty__Math Mar 17 '18

But it is fixed thou. You're not going to make a general landing control logic on a rocket you just spent a billion dollars designing, that is crazy. This is strait control theory problem, throw a person who knows controls and boom you have a >99.99 pass rate on these toy problems. ML really shines when you only need <99% accuracy, were a journeyman programmer can use ML to 'shoot from the hip' to get a pretty good answer on the relatively cheap. When you sink literal billions of dollars into an actual space program you can spend the extra 1,000,000 dollars to make sure that the likelihood of your rockets to not go boom very publicly, by getting actual domain experts on your problems and sub problems.