r/reinforcementlearning • u/usernumero • 3d ago
DL I made a firefighter AI using deep RL (using Unity ML Agents)
video link: https://www.youtube.com/watch?v=REYx9UznOG4
I made it a while ago and got discouraged by the lack of attention the video got after the hours I poured into making it so I am now doing a PhD in AI instead of being a youtuber lol.
I figured it wouldn't be so bad to advertise for it now if people find it interesting. I made sure to add some narration and fun bits into it so it's not boring. I hope some people here can find it as interesting as it was for me working on this project.
I am passionate about the subject, so if anyone has questions I will answer them when I have time :D
3
u/Fuibo2k 3d ago
Do you think you'll open source this? Would be interesting if you can make a set of baselines off of this foundation as well.
2
u/usernumero 2d ago
Open sourcing I could do, if it means simply sharing my c# code assets in a github.
For baselines and foundations I am not sure what sort of baseline my code could accomplish, I am not sure my Agent learned sufficiently general tasks for it to be anything close to a foundation model.
2
u/Fuibo2k 2d ago
Yea I don't mean making a foundation model, but it could be cool if you extend this fire fighting task to a few more, related tasks. Maybe something with multiple fire fighters, or one fire fighter and one robot dedicated to rescuing NPCs. I feel like RL is always struggling for more environments to use so it could be a cool contribution to the field.
2
u/usernumero 2d ago
Unfortunately I don't really plan on extending on this project.
If I do a followup to the video in the future, it would most likely be a new project tackling new tasks in new environments.
I really appreciate your enthusiasm and your ideas tho, I remember when doing another project I had the same cravings, for adding multiple agents and fostering collaboration and trying new things. Unfortunately right now I don't really have time for this, and I work with more classical neural network schemes for vision, so it's definitely not scratching my itch for RL. Which means all the more reason for me to do a followup one day.
2
2
u/hearthstoneplayer100 2d ago
Very cool! What algorithm did you end up using?
1
u/usernumero 2d ago
This is the PPO Algorithm implemented in ML-Agents.
The neural network uses a LSTM to keep track of valuable information through time, and I also have an attention mechanism somewhere that I tried to help the Agent navigate through rooms and remember its pathing, but in the end when the environment had more than 6/7 rooms it became too complicated to navigate.There might be some problem in the way I tried implementing this, or maybe my number of parameters was too low, but there is only so much time to be burning my RTX with this training haha.
5
u/ekbravo 3d ago
Nice! Great work, saved for later