Skip to content

test(value_learning): add stop_target_gradients regression tests#162

Open
Sumu004 wants to merge 2 commits into
google-deepmind:mainfrom
Sumu004:test/stop-target-gradients-value-learning
Open

test(value_learning): add stop_target_gradients regression tests#162
Sumu004 wants to merge 2 commits into
google-deepmind:mainfrom
Sumu004:test/stop-target-gradients-value-learning

Conversation

@Sumu004

@Sumu004 Sumu004 commented Jun 4, 2026

Copy link
Copy Markdown

What does this PR do?

Adds StopTargetGradientsDefaultTest to value_learning_test.py — the gradient-behaviour counterpart of the tests added in #161 for multistep_test.py.

All value_learning functions correctly default to stop_target_gradients=True, which means gradients do not flow through bootstrap targets (matching vtrace.py). However, there were no tests verifying this at the gradient level — the existing tests only check forward-computation correctness.

New tests for td_learning, sarsa, and q_learning:

  1. Default blocks gradientgrad(output wrt v_t/q_t) is exactly zero when using the default
  2. Explicit False passes gradient — the opt-in meta-gradient path still works
  3. Forward values unchangedstop_gradient is transparent in the forward pass

These tests would have caught a regression if the defaults were accidentally flipped (as happened in multistep.py before #161).

Sumu004 added 2 commits June 4, 2026 23:17
value_learning functions (td_learning, sarsa, q_learning) correctly
default to stop_target_gradients=True — matching vtrace.py — but had
no tests verifying that the gradient is actually blocked.

Add StopTargetGradientsDefaultTest covering:
  - Default (True): gradient wrt bootstrap target (v_t / q_t) is zero
  - Explicit False: gradient does flow (opt-in meta-gradient path)
  - Forward values are identical regardless of the flag (stop_gradient
    is transparent in forward computation)

These tests are the value_learning counterpart of the regression tests
added to multistep_test.py in PR google-deepmind#161, completing the coverage story.
…ests

- Break inline lambda-style defs onto two lines (C0321 multiple-statements)
- Shorten StopTargetGradientsDefaultTest docstring to fit 80 chars (C0301)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant