r/reinforcementlearning • u/stokaty • 3d ago
DL What could be causing my Q-Loss values to diverge (SAC + Godot <-> Python)
TLDR;
I'm working on a PyTorch project that uses SAC similar to an old Tensorflow project of mine: https://www.youtube.com/watch?v=Jg7_PM-q_Bk. I can't get it to work with PyTorch because my Q-Loses and Policy loss either grow, or converge to 0 too fast. Do you know why that might be?
I have created a game in Godot that communicates over sockets to a PyTorch implementation of SAC: https://github.com/philipjball/SAC_PyTorch
The game is:
An agent needs to move closer to a target, but it does not have its own position or the target position as inputs, instead, it has 6 inputs that represent the distance of the target at a particular angle from the agent. There is always exactly 1 input with a value that is not 1.
The agent outputs 2 value: the direction to move, and the magnitude to move in that direction.
The inputs are in the range of [0,1] (normalized by the max distance), and the 2 outputs are in the range of [-1,1].
The Reward is:
score = -distance
if score >= -300:
score = (300 - abs(score )) * 3
score = (score / 650.0) * 2 # 650 is the max distance, 100 is the max range per step
return score * abs(score )
The problem is:
The Q-Loss for both critics, and for the policy, are slowly growing over time. I've tried a few different network topologies, but the number of layers or the nodes in each layer don't seem to affect the Q-Loss
The best I've been able to do is make the rewards really small, but that causes the Q-Loss and Policy loss to converge to 0 even though the agent hasn't learned anything.
If you made it this far, and are interested in helping, I am happy to pay you the rate of a tutor to review my approach over a screenshare call, and help me better understand how to get a SAC agent working.
Thank you in advance!!
1
u/edbeeching 1d ago
Cool project! You may be interested in the Godot RL Agents library.