Let's have those two grid2op actions in the best path when playing the oracle:
si grid2op_A= action_1 + action_2
et grid2op_B=action_1+action_2 + action_3
In the case we have grid2op_B at timestep t and grid2op_A at timestept+1, replay will play:
- at t, action_1+action_2 + action_3 so the grid will be in config_t (topo_init + action_1+action_2 + action_3)
- at t+1, action_1+action_2 so the grid will be in (config_t+action_1+action_2)=config_t instead of config_t+1=(topo_init + action_1+action_2). action_3 was not undone ...
We need to test that. topo_vect from graph_node configurations and from replay agent should be the same at each timestep