test(value_learning): add stop_target_gradients regression tests by Sumu004 · Pull Request #162 · google-deepmind/rlax

Sumu004 · 2026-06-04T22:17:18Z

What does this PR do?

Adds StopTargetGradientsDefaultTest to value_learning_test.py — the gradient-behaviour counterpart of the tests added in #161 for multistep_test.py.

All value_learning functions correctly default to stop_target_gradients=True, which means gradients do not flow through bootstrap targets (matching vtrace.py). However, there were no tests verifying this at the gradient level — the existing tests only check forward-computation correctness.

New tests for td_learning, sarsa, and q_learning:

Default blocks gradient — grad(output wrt v_t/q_t) is exactly zero when using the default
Explicit False passes gradient — the opt-in meta-gradient path still works
Forward values unchanged — stop_gradient is transparent in the forward pass

These tests would have caught a regression if the defaults were accidentally flipped (as happened in multistep.py before #161).

value_learning functions (td_learning, sarsa, q_learning) correctly default to stop_target_gradients=True — matching vtrace.py — but had no tests verifying that the gradient is actually blocked. Add StopTargetGradientsDefaultTest covering: - Default (True): gradient wrt bootstrap target (v_t / q_t) is zero - Explicit False: gradient does flow (opt-in meta-gradient path) - Forward values are identical regardless of the flag (stop_gradient is transparent in forward computation) These tests are the value_learning counterpart of the regression tests added to multistep_test.py in PR google-deepmind#161, completing the coverage story.

…ests - Break inline lambda-style defs onto two lines (C0321 multiple-statements) - Shorten StopTargetGradientsDefaultTest docstring to fit 80 chars (C0301)

Sumu004 added 2 commits June 4, 2026 23:17

fix(value_learning_test): fix pylint style violations in regression t…

b59c316

…ests - Break inline lambda-style defs onto two lines (C0321 multiple-statements) - Shorten StopTargetGradientsDefaultTest docstring to fit 80 chars (C0301)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

test(value_learning): add stop_target_gradients regression tests#162

test(value_learning): add stop_target_gradients regression tests#162
Sumu004 wants to merge 2 commits into
google-deepmind:mainfrom
Sumu004:test/stop-target-gradients-value-learning

Sumu004 commented Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Sumu004 commented Jun 4, 2026

What does this PR do?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant