FLTest

A testbed for evaluating the privacy and robustness of Privacy-Preserving Federated Learning (PPFL). FLTest gives software-defined control and visibility into FL testing: run the same experiment across multiple FL frameworks, inject attacks and defenses as composable hooks, and apply differential and metamorphic tests plus a pitfall checker — all from a single YAML config.

FLTest is the NSF PDaSP (Track 3) FLTEST project — A Testbed for Enhancing Privacy and Robustness of Federated Learning Systems — supported by the U.S. National Science Foundation (Award #2452817-19).

Principal Investigators: Ali Anwar (University of Minnesota) · Muhammad Ali Gulzar (Virginia Tech) · Fatima Anwar (University of Massachusetts Amherst)

Why

A survey of 50 FL robustness papers found wildly inconsistent setups (MNIST-only, IID-only, naive attacks, no personalized metrics), which inflates privacy/robustness claims. FLTest makes a rigorous setup the default and checks for the common pitfalls.

Highlights

One abstraction, many backends. Every FL framework implements a single run_simulation() adapter. Built in: a dependency-light reference PyTorch FedAvg oracle, Flower, and NVFlare (optional extra).
Everything is a hook. Attacks, defenses, and metric listeners are hook plugins that share one HookContext, so a plugin written once runs across every backend and multiple plugins compose on a single run.
Attacks: label_flip, sign_flip, gaussian, backdoor (with attack-success-rate), dlg (gradient-inversion privacy attack).
Defenses (PPFL): gradient_noise (DP-style clip+noise), norm_clip, and robust aggregation krum / trimmed_mean / median.
Differential testing: same config across frameworks must agree within tolerance (cross-framework parity); or the same spec run twice must be identical (determinism).
Metamorphic testing: clients_scale (N→2N), rounds_monotonic, attack_strength, dp_noise relations.
Pitfall checker + recommender: flags the six FL-evaluation pitfalls from the project and emits copy-pasteable counter-experiments.
Config fuzzer: any list-valued knob (e.g. dataset: [mnist, cifar10]) is expanded into a grid of runs.

Install (isolated conda env)

conda env create -f environment.yml      # creates env "fltest" (Python 3.11)
conda activate fltest
pip install -e ".[dev]"                   # core (reference + Flower) + test tooling
pip install -e ".[nvflare]"               # optional NVFlare backend (needs Python <=3.11)

CPU is the default and is deterministic; device: mps (Apple Silicon) or device: cuda are selectable for speed (with the usual GPU non-determinism caveat).

Use

fltest list                                              # available frameworks/attacks/defenses/metrics
fltest run         examples/configs/differential.yaml
fltest diff        examples/configs/differential_3way.yaml   # cross-framework parity
fltest metamorphic examples/configs/metamorphic.yaml
fltest pitfalls    examples/configs/pitfalls_demo.yaml
fltest run         examples/configs/attack_label_flip.yaml
fltest run         examples/configs/dlg.yaml                  # privacy attack
fltest run         examples/configs/defense_robust.yaml      # backdoor vs median agg

Loadable hook files (slide-style), no config edits:

export FLTEST_HOOKS=examples/hooks/atk_dlg,examples/hooks/def_gradient_noise
fltest run examples/configs/dlg.yaml

Config sketch (`test_conf.yaml`)

name: my_eval
dataset: [mnist, cifar10]        # a list => fuzzed into a grid
data_distribution: [iid, dirichlet]
model_name: LeNet
num_clients: 10
num_rounds: 10
attacks:  [{name: backdoor, params: {infection_rate: 0.3}, target_clients: [0,1]}]
defenses: [{name: median}]
metrics:  [accuracy, loss, per_client]
runs:                            # one per framework => cross-framework differential
  - {framework: reference}
  - {framework: flwr}
  - {framework: nvflare}
testing:
  differential: {mode: cross_framework, metric: accuracy, tolerance: 0.05}
  metamorphic:
    - {relation: clients_scale, values: [10, 20], tolerance: 0.05}

Tests

pytest tests/ -q

See docs/ARCHITECTURE.md for the design and examples/configs/ for runnable configs.

Notes & limitations

NVFlare runs each client in its own simulator process, so client-side hooks (attacks/defenses at before/after_client_train) do not apply to it — it is used for cross-framework differential parity of the vanilla FedAvg path. The reference and Flower backends support the full hook surface.
DLG source: gradient (default) demonstrates raw-gradient invertibility. The source: shared_update mode (reconstruct from the uploaded update) is faithful only under single-step (FedSGD) training.
A Dockerfile (CPU, Linux) is provided as a deliverable; the verified path is the conda env above.

Acknowledgement

This material is based upon work supported by the U.S. National Science Foundation under the Privacy-preserving Data Sharing in Practice (PDaSP) program, Track 3 — Usable Tools and Testbeds for Confidential Data Sharing, Award #2452817-19. The PDaSP program is supported by the NSF together with its co-sponsors (U.S. Department of Transportation, Intel, NIST, and Broadcom). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation or its co-sponsors. Program information: https://pdasp.net/projects/.

Name		Name	Last commit message	Last commit date
Latest commit History 47 Commits
.github/workflows		.github/workflows
bug_reports		bug_reports
docs		docs
examples		examples
fltest		fltest
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
USAGE.md		USAGE.md
environment.yml		environment.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FLTest

Why

Highlights

Install (isolated conda env)

Use

Config sketch (`test_conf.yaml`)

Tests

Notes & limitations

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FLTest

Why

Highlights

Install (isolated conda env)

Use

Config sketch (test_conf.yaml)

Tests

Notes & limitations

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Config sketch (`test_conf.yaml`)

Packages