Fix/per module accumulate iterations#50
Merged
Merged
Conversation
In compute_dot_products_with_loader, accumulate_iterations was only called in the aggregated-scores branch, leaving stale per-module pairwise score state across iterations. This is benign in the default path (direct assignment overwrites the storage) but corrupts results when a task sets enable_post_process_per_sample_gradient=True, since that path uses add_ on the stale tensor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Use nn.Sequential consistently instead of mixing torch.nn.Sequential - Pass weights_only=True to torch.load to silence the PyTorch 2.6+ warning - Clarify that MnistTask must be defined by the user (subclass Task) - Document the "all_modules" key returned by load_pairwise_scores Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- DOCUMENTATION.md: rename task.tracked_modules to task.get_influence_tracked_modules - DOCUMENTATION.md: add required query_dataset/train_dataset/per_device_query_batch_size args to compute_pairwise_scores snippet - DOCUMENTATION.md: add required train_dataset arg to compute_self_scores snippet - DOCUMENTATION.md: point GPT-2 wikitext link to main branch - DOCUMENTATION.md: document use_full_svd in the ScoreArguments bullet list - task.py: fix post_process_per_sample_gradient docstring to reference the actual enable_post_process_per_sample_gradient flag Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The score forward/backward autocast uses score_args.amp_dtype, but enable_grad_scaler was gated on factor_args.amp_dtype. Users who set score_args.amp_dtype=torch.float16 without also setting factor_args.amp_dtype=torch.float16 silently ran fp16 backwards with GradScaler disabled, causing per-sample gradients to underflow. Same fix applied across compute_pairwise_scores_with_loaders, compute_pairwise_query_aggregated_scores_with_loaders, compute_self_scores_with_loaders, and compute_self_measurement_scores_with_loaders. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…nds check The error message says "must be in range [0, N)" but the check used strict >, accepting boundary value N. A user passing target=N got a late IndexError after expensive setup instead of an early clear ValueError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covariance matrices are PSD, but torch.linalg.eigh can return tiny negative eigenvalues due to roundoff. Without the clamp, the Kfac damping path (lambda_matrix.add_(damping).reciprocal_()) can produce divisions by near-zero or negative values when a damping heuristic gets dragged down by the negative noise. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TrackedLinear.compute_pairwise_score returns (Q, B, T) when per-token scores are enabled, but TrackedConv2d ignored the flag and returned (Q, B). On a mixed model: - aggregated path raised a misleading DIMENSION_NOT_MATCH error from dot_product.py's catch-all. - compute_per_module_scores=True path returned inconsistent shapes across modules without any error. Now Conv2d raises UnsupportableModuleError early with a clear message suggesting either disabling per-token scoring or excluding Conv2d from tracked modules. Added a regression test in tests/scores. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
accumulate_iterationsfor per-module pairwise scores incompute_dot_products_with_loader(
kronfluence/score/dot_product.py). The call was nested inside the aggregated-scoreselsebranch, leavingPAIRWISE_SCORE_MATRIX_NAMEstorage uncleared between iterations. Benign with the default direct-assignment path (line 86 in
pairwise_score.py), but corrupts resultswhen a
Tasksetsenable_post_process_per_sample_gradient=True, since that path usesadd_on the stale tensor and per-module scoresaccumulate across iterations.
eval_datasetwas loading the MNIST training split (train=True) instead of the test split.Test plan
tests/scores/test_pairwise_scores.py::test_per_module_scores_equivalencestill passes (it doesn't exercise the bug sincethe default task has
enable_post_process_per_sample_gradient=False).enable_post_process_per_sample_gradient=Truewithcompute_per_module_scores=Trueto lockin the fix.
One thing to flag: we haven't actually solved the original "build is failing on master" question. The failed CI run's logs had expired, and
locally all tests pass for me on Python 3.10. If you can re-run the failed workflow on GitHub (or push any small commit to retrigger CI),
I can look at the live logs and dig into the real cause.