Releases: SEED-VT/FLTest
Releases · SEED-VT/FLTest
v0.2.0-alpha.1
v0.2.0-alpha.1
Alpha release. Simulation only, single machine. APIs and config schema may change.
FLTest evaluates the privacy and robustness of privacy-preserving federated learning (PPFL):
run the same config across FL frameworks, inject attacks/defenses as hooks, and apply
differential and metamorphic tests.
Features
Backends (one run_simulation() interface):
reference— pure-PyTorch FedAvg, deterministic on CPU, full hook coverageflwr— Flower (Ray simulation)nvflare— NVIDIA FLARE simulator (extra:pip install -e ".[nvflare]")
Plugins (composable hooks, work on any backend):
- Attacks:
label_flip,sign_flip,gaussian,backdoor(reports ASR),dlg(gradient inversion) - Defenses:
gradient_noise(clip+noise),norm_clip,krum,trimmed_mean,median - Metrics: accuracy, loss, attack_success_rate, DLG reconstruction (mse/psnr/label_recovery), per-client accuracy
Testing:
- Differential: cross-framework parity, and within-framework determinism
- Metamorphic:
clients_scale,rounds_monotonic,attack_strength,dp_noise - Pitfall checker (6 checks) + counter-experiment suggestions
Other:
- Config fuzzer (list-valued knobs expand to a grid)
- Datasets: MNIST, Fashion-MNIST, CIFAR-10; partitioners: iid, dirichlet, pathological
- CLI:
fltest run|diff|metamorphic|pitfalls|list, JSON reports - Docs site, CPU Dockerfile, pytest suite
Limitations
- Simulation only. No distributed/remote-node deployment; clients run as local
processes/Ray actors. Scale is bounded by one machine. - NVFlare runs clients in separate processes, so client-side hooks (attacks/defenses at
before/after_client_train) don't apply to it; it's used for vanilla-FedAvg parity only.
reference and flwr have full hook coverage. - DLG
source: shared_updateis only valid for single-step (FedSGD) training; use
source: gradient(default) otherwise. - attack_success_rate is unreliable when accuracy is very low (degenerate model); no
lift-corrected metric yet. - Pitfall checker inspects the config, not run results.
- Determinism is CPU-only; MPS/CUDA are not reproducible.
- gradient_noise is clip+noise, not a formal (epsilon, delta) DP accountant. No HE/SMC.
- Requires Python 3.11 (NVFlare). Image-classification tasks only.
Install
conda env create -f environment.yml && conda activate fltest
pip install -e ".[dev]" # reference + Flower
pip install -e ".[nvflare]" # optional NVFlare backend
fltest list