Hi authors,
Thank you for releasing TraceFL — I’ve been reproducing your results using the provided implementation and found the method very insightful for client-level attribution in federated learning.
Observation
I was able to reproduce the reported behavior of TraceFL within the range of communication rounds presented in the paper (roughly 20–60 rounds), where:
Attribution remains highly accurate
Client responsibility is clearly distinguishable
However, when extending training to larger numbers of communication rounds (e.g., 200–300 rounds), I consistently observe:
Attribution scores becoming less discriminative across clients
Increased fluctuation or flattening of contribution distributions
Degradation in client localization performance.
Setup
Same dataset and configuration as described in the paper
Using the official repository implementation
Only modification: increasing the number of communication rounds and running provenance after each round, and deleting the exp_key instead of saving it, and running provenance after the training completion.
Questions
Is TraceFL primarily intended for early-to-mid training phases, rather than fully converged models?
Have you observed similar behavior when training for a larger number of rounds?
Could this be related to:
Model convergence leading to reduced gradient/activation signal
Client homogenization across rounds
Softmax-based normalization becoming less discriminative over time?
Are there recommended practices to maintain attribution quality in later rounds (e.g., normalization strategies, filtering, or temporal aggregation)?
I’d appreciate any insights on whether this is expected behavior or if there are suggested ways to address it.
Thanks again for your work!
Hi authors,
Thank you for releasing TraceFL — I’ve been reproducing your results using the provided implementation and found the method very insightful for client-level attribution in federated learning.
Observation
I was able to reproduce the reported behavior of TraceFL within the range of communication rounds presented in the paper (roughly 20–60 rounds), where:
Attribution remains highly accurate
Client responsibility is clearly distinguishable
However, when extending training to larger numbers of communication rounds (e.g., 200–300 rounds), I consistently observe:
Attribution scores becoming less discriminative across clients
Increased fluctuation or flattening of contribution distributions
Degradation in client localization performance.
Setup
Same dataset and configuration as described in the paper
Using the official repository implementation
Only modification: increasing the number of communication rounds and running provenance after each round, and deleting the exp_key instead of saving it, and running provenance after the training completion.
Questions
Is TraceFL primarily intended for early-to-mid training phases, rather than fully converged models?
Have you observed similar behavior when training for a larger number of rounds?
Could this be related to:
Model convergence leading to reduced gradient/activation signal
Client homogenization across rounds
Softmax-based normalization becoming less discriminative over time?
Are there recommended practices to maintain attribution quality in later rounds (e.g., normalization strategies, filtering, or temporal aggregation)?
I’d appreciate any insights on whether this is expected behavior or if there are suggested ways to address it.
Thanks again for your work!