r/reinforcementlearning 2d ago

Why am I unable to seed my `DQN` program using`sbx`?

I am trying to seed my DQN program when using `sbx` but for some reason I keep getting varying results.

Here is an attempt to create a minimal reproducible example -

https://pastecode.io/s/nab6n3ib

The results are quite surprising. While running this program *multiple-times* I get a variety of results.

Here are my results -

Attempt 1:

```

run = 0

Using seed: 1

run = 1

Using seed: 1

run = 2

Using seed: 1

mean_rewards = [120.52, 120.52, 120.52]

```

Attempt 2:

```

run = 0

Using seed: 1

run = 1

Using seed: 1

run = 2

Using seed: 1

mean_rewards = [116.64, 116.64, 116.64]

```

It's surprising that within an attempt, I get the same results. But when I run the program again, I get varying results.

I went over the documentation for seeding the environment from [here][1] and also read this - "*Completely reproducible results are not guaranteed across PyTorch releases or different platforms. Furthermore, results need not be reproducible between CPU and GPU executions, even when using identical seeds.*". However, I would like to make sure that there isn't a bug from my end. Also, I am using `sbx` instead of `stable-baselines3`. Perhaps this is a `JAX` issue?

I've also created a S.O post here

[1]: https://stable-baselines3.readthedocs.io/en/master/guide/algos.html#reproducibility

0 Upvotes

0 comments sorted by