Skip to content

Releases: SEED-VT/FLTest

v0.2.0-alpha.1

16 Jun 15:54

Choose a tag to compare

v0.2.0-alpha.1 Pre-release
Pre-release

v0.2.0-alpha.1

Alpha release. Simulation only, single machine. APIs and config schema may change.

FLTest evaluates the privacy and robustness of privacy-preserving federated learning (PPFL):
run the same config across FL frameworks, inject attacks/defenses as hooks, and apply
differential and metamorphic tests.

Features

Backends (one run_simulation() interface):

  • reference — pure-PyTorch FedAvg, deterministic on CPU, full hook coverage
  • flwr — Flower (Ray simulation)
  • nvflare — NVIDIA FLARE simulator (extra: pip install -e ".[nvflare]")

Plugins (composable hooks, work on any backend):

  • Attacks: label_flip, sign_flip, gaussian, backdoor (reports ASR), dlg (gradient inversion)
  • Defenses: gradient_noise (clip+noise), norm_clip, krum, trimmed_mean, median
  • Metrics: accuracy, loss, attack_success_rate, DLG reconstruction (mse/psnr/label_recovery), per-client accuracy

Testing:

  • Differential: cross-framework parity, and within-framework determinism
  • Metamorphic: clients_scale, rounds_monotonic, attack_strength, dp_noise
  • Pitfall checker (6 checks) + counter-experiment suggestions

Other:

  • Config fuzzer (list-valued knobs expand to a grid)
  • Datasets: MNIST, Fashion-MNIST, CIFAR-10; partitioners: iid, dirichlet, pathological
  • CLI: fltest run|diff|metamorphic|pitfalls|list, JSON reports
  • Docs site, CPU Dockerfile, pytest suite

Limitations

  • Simulation only. No distributed/remote-node deployment; clients run as local
    processes/Ray actors. Scale is bounded by one machine.
  • NVFlare runs clients in separate processes, so client-side hooks (attacks/defenses at
    before/after_client_train) don't apply to it; it's used for vanilla-FedAvg parity only.
    reference and flwr have full hook coverage.
  • DLG source: shared_update is only valid for single-step (FedSGD) training; use
    source: gradient (default) otherwise.
  • attack_success_rate is unreliable when accuracy is very low (degenerate model); no
    lift-corrected metric yet.
  • Pitfall checker inspects the config, not run results.
  • Determinism is CPU-only; MPS/CUDA are not reproducible.
  • gradient_noise is clip+noise, not a formal (epsilon, delta) DP accountant. No HE/SMC.
  • Requires Python 3.11 (NVFlare). Image-classification tasks only.

Install

conda env create -f environment.yml && conda activate fltest
pip install -e ".[dev]"        # reference + Flower
pip install -e ".[nvflare]"    # optional NVFlare backend
fltest list

Docs

https://seed-vt.github.io/FLTest/