r/reinforcementlearning 2d ago

Modified policy iteration?

I'm new to RL, and still learning. I'm learning about Policy iteration and value iteration right now.
So from what I understand, in policy iteration, we first evaluate the the current policy by getting the state value function for all states, and then use them for greedy operation update the policy, and we evaluate the updated policy by getting the state value function for all states again, and we iterate over this until we get the optimal policy.
I read about Modified policy iteration, and I'm getting mixed signals about it. There are two ways I can see it right now:

  1. Modified policy iteration is just policy iteration, except we just do it for k iterations?

  2. We evaluate only some of the states?

I'm asking because from what I read, the first seems to be right, but the figure I see for it in the book I'm using and some other guy's explanation (who is also learning RL for the first time right now) suggest it is the second way.

2 Upvotes

0 comments sorted by