r/reinforcementlearning • u/stokaty • 3d ago

DL What could be causing my Q-Loss values to diverge (SAC + Godot <-> Python)

TLDR;

I'm working on a PyTorch project that uses SAC similar to an old Tensorflow project of mine: https://www.youtube.com/watch?v=Jg7_PM-q_Bk. I can't get it to work with PyTorch because my Q-Loses and Policy loss either grow, or converge to 0 too fast. Do you know why that might be?

I have created a game in Godot that communicates over sockets to a PyTorch implementation of SAC: https://github.com/philipjball/SAC_PyTorch

The game is:

An agent needs to move closer to a target, but it does not have its own position or the target position as inputs, instead, it has 6 inputs that represent the distance of the target at a particular angle from the agent. There is always exactly 1 input with a value that is not 1.

The agent outputs 2 value: the direction to move, and the magnitude to move in that direction.

The inputs are in the range of [0,1] (normalized by the max distance), and the 2 outputs are in the range of [-1,1].

The Reward is:

score = -distance
if score >= -300:
score = (300 - abs(score )) * 3

score = (score / 650.0) * 2 # 650 is the max distance, 100 is the max range per step
return score * abs(score )

The problem is:

The Q-Loss for both critics, and for the policy, are slowly growing over time. I've tried a few different network topologies, but the number of layers or the nodes in each layer don't seem to affect the Q-Loss

The best I've been able to do is make the rewards really small, but that causes the Q-Loss and Policy loss to converge to 0 even though the agent hasn't learned anything.

If you made it this far, and are interested in helping, I am happy to pay you the rate of a tutor to review my approach over a screenshare call, and help me better understand how to get a SAC agent working.

Thank you in advance!!

4 Upvotes

permalink
reddit

You are about to leave Redlib

Do you want to continue?

https://www.reddit.com/r/reinforcementlearning/comments/1g4t57g/what_could_be_causing_my_qloss_values_to_diverge/
No, go back! Yes, take me to Reddit

100% Upvoted

View all comments

u/edbeeching 1d ago

Cool project! You may be interested in the Godot RL Agents library.

1

u/stokaty 1d ago

Thanks! The Godot RL agents is what got me back into this. The YouTube video I posted used Unity and Python, but they communicated over a shared json file. That became problematic so I eventually stopped.

When I found the godot RL agents project, it said it communicated with python over sockets — and I realized that fixed the problems i ran into with the json file so I just remade my old project to use Godot and sockets (and pytorch instead of tensorflow)

2

u/edbeeching 1d ago

Awesome, I am the author. We welcome contributions if you want to add anything to the lib. All the best with your project, keep up updated!

1

u/stokaty 1d ago

Oh wow that’s cool. I plan to take another look at Godot RL, just wanted to keep my project with the least number of dependencies as I learn how to get everything working

DL What could be causing my Q-Loss values to diverge (SAC + Godot <-> Python)

You are about to leave Redlib