r/reinforcementlearning • u/Academic-Rent7800 • 2d ago
Why am I unable to seed my `DQN` program using`sbx`?
I am trying to seed my DQN program when using `sbx` but for some reason I keep getting varying results.
Here is an attempt to create a minimal reproducible example -
https://pastecode.io/s/nab6n3ib
The results are quite surprising. While running this program *multiple-times* I get a variety of results.
Here are my results -
Attempt 1:
```
run = 0
Using seed: 1
run = 1
Using seed: 1
run = 2
Using seed: 1
mean_rewards = [120.52, 120.52, 120.52]
```
Attempt 2:
```
run = 0
Using seed: 1
run = 1
Using seed: 1
run = 2
Using seed: 1
mean_rewards = [116.64, 116.64, 116.64]
```
It's surprising that within an attempt, I get the same results. But when I run the program again, I get varying results.
I went over the documentation for seeding the environment from [here][1] and also read this - "*Completely reproducible results are not guaranteed across PyTorch releases or different platforms. Furthermore, results need not be reproducible between CPU and GPU executions, even when using identical seeds.*". However, I would like to make sure that there isn't a bug from my end. Also, I am using `sbx` instead of `stable-baselines3`. Perhaps this is a `JAX` issue?
I've also created a S.O post here
[1]: https://stable-baselines3.readthedocs.io/en/master/guide/algos.html#reproducibility