Skip to content

Fix/per module accumulate iterations#50

Merged
pomonam merged 8 commits into
mainfrom
fix/per-module-accumulate-iterations
May 23, 2026
Merged

Fix/per module accumulate iterations#50
pomonam merged 8 commits into
mainfrom
fix/per-module-accumulate-iterations

Conversation

@pomonam
Copy link
Copy Markdown
Owner

@pomonam pomonam commented May 23, 2026

Summary

  • Fix missing accumulate_iterations for per-module pairwise scores in compute_dot_products_with_loader
    (kronfluence/score/dot_product.py). The call was nested inside the aggregated-scores else branch, leaving PAIRWISE_SCORE_MATRIX_NAME
    storage uncleared between iterations. Benign with the default direct-assignment path (line 86 in pairwise_score.py), but corrupts results
    when a Task sets enable_post_process_per_sample_gradient=True, since that path uses add_ on the stale tensor and per-module scores
    accumulate across iterations.
  • Fix README's quickstart example: eval_dataset was loading the MNIST training split (train=True) instead of the test split.

Test plan

  • Existing tests/scores/test_pairwise_scores.py::test_per_module_scores_equivalence still passes (it doesn't exercise the bug since
    the default task has enable_post_process_per_sample_gradient=False).
  • Optional: add a regression test enabling enable_post_process_per_sample_gradient=True with compute_per_module_scores=True to lock
    in the fix.

One thing to flag: we haven't actually solved the original "build is failing on master" question. The failed CI run's logs had expired, and
locally all tests pass for me on Python 3.10. If you can re-run the failed workflow on GitHub (or push any small commit to retrigger CI),
I can look at the live logs and dig into the real cause.

pomonam and others added 8 commits May 23, 2026 02:15
In compute_dot_products_with_loader, accumulate_iterations was only
called in the aggregated-scores branch, leaving stale per-module
pairwise score state across iterations. This is benign in the default
path (direct assignment overwrites the storage) but corrupts results
when a task sets enable_post_process_per_sample_gradient=True, since
that path uses add_ on the stale tensor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Use nn.Sequential consistently instead of mixing torch.nn.Sequential
- Pass weights_only=True to torch.load to silence the PyTorch 2.6+ warning
- Clarify that MnistTask must be defined by the user (subclass Task)
- Document the "all_modules" key returned by load_pairwise_scores

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- DOCUMENTATION.md: rename task.tracked_modules to task.get_influence_tracked_modules
- DOCUMENTATION.md: add required query_dataset/train_dataset/per_device_query_batch_size args to compute_pairwise_scores snippet
- DOCUMENTATION.md: add required train_dataset arg to compute_self_scores snippet
- DOCUMENTATION.md: point GPT-2 wikitext link to main branch
- DOCUMENTATION.md: document use_full_svd in the ScoreArguments bullet list
- task.py: fix post_process_per_sample_gradient docstring to reference the actual enable_post_process_per_sample_gradient flag

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The score forward/backward autocast uses score_args.amp_dtype, but
enable_grad_scaler was gated on factor_args.amp_dtype. Users who set
score_args.amp_dtype=torch.float16 without also setting
factor_args.amp_dtype=torch.float16 silently ran fp16 backwards with
GradScaler disabled, causing per-sample gradients to underflow.

Same fix applied across compute_pairwise_scores_with_loaders,
compute_pairwise_query_aggregated_scores_with_loaders,
compute_self_scores_with_loaders, and
compute_self_measurement_scores_with_loaders.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nds check

The error message says "must be in range [0, N)" but the check used
strict >, accepting boundary value N. A user passing target=N got a
late IndexError after expensive setup instead of an early clear
ValueError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covariance matrices are PSD, but torch.linalg.eigh can return tiny
negative eigenvalues due to roundoff. Without the clamp, the Kfac
damping path (lambda_matrix.add_(damping).reciprocal_()) can produce
divisions by near-zero or negative values when a damping heuristic
gets dragged down by the negative noise.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TrackedLinear.compute_pairwise_score returns (Q, B, T) when per-token
scores are enabled, but TrackedConv2d ignored the flag and returned
(Q, B). On a mixed model:
- aggregated path raised a misleading DIMENSION_NOT_MATCH error from
  dot_product.py's catch-all.
- compute_per_module_scores=True path returned inconsistent shapes
  across modules without any error.

Now Conv2d raises UnsupportableModuleError early with a clear message
suggesting either disabling per-token scoring or excluding Conv2d from
tracked modules. Added a regression test in tests/scores.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@pomonam pomonam merged commit 19088a8 into main May 23, 2026
8 checks passed
@pomonam pomonam deleted the fix/per-module-accumulate-iterations branch May 23, 2026 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant