Skip to content

AndrewKim1997/anti-regularization-parametric-models

Repository files navigation

arXiv: 2508.17412 CI Docker Compose Python 3.10–3.12 License: MIT

Anti-regularization (AR) for Parametric Models

Quick startReproduceSafetySchedulesCite

A simple, safe “negative regularization” that boosts expressivity when data are scarce, then fades out with a sample-size–dependent decay. Works with linear models and shallow MLPs, with stability safeguards.

  • Paper: Convergence and Generalization of Anti-regularization for Parametric Models (https://arxiv.org/abs/2508.17412)
  • Core idea: add a sign-reversed reward term early to reduce underfitting, then decay via a power-law schedule so training converges back to standard ERM as data grow. Stability is ensured by a projection (trust-region) + gradient clipping safeguard.

✨ TL;DR

  • Small-sample: AR increases effective DoF slightly to fix underfitting.
  • As n grows: the decay schedule shrinks λ → 0, recovering the baseline without hurting generalization.
  • Safety: spectral/trust-region safety condition + clipping prevent divergence; we also log output-scale ratio and clipping/projection rates.

📊 Results at a glance

AR overview — lambda decay, optimizer trajectory, safety region


🛠️ Installation

# 1) clone
git clone https://github.com/AndrewKim1997/anti-regularization-parametric-models.git
cd anti-regularization-parametric-models

# 2) Python deps (3.10+)
python -m venv .venv && source .venv/bin/activate   # Windows: .venv\Scripts\activate
pip install -r requirements.txt

# (optional) editable install
# pip install -e .

Docker (optional)

docker compose up --build

📁 Repository structure

.github/workflows/ci.yml   # unit tests & config checks on push/PR
configs/                   # ready-to-run YAMLs (experiments & ablations)
data/                      # (auto)downloaded datasets live under data/raw/*
docker/                    # docker-compose.yml for CPU/CUDA images
experiments/               # per-run folders with metrics & logs
results/                   # collected CSVs (paper tables)
src/ar/                    # library code & main entrypoint (src.ar.run)
tests/                     # unit tests
LICENSE, CITATION.cff, pyproject.toml, requirements.txt, README.md

⚡ Quick start

All runs use the module entrypoint:

python -m src.ar.run --config <path/to/config.yaml>

Regression

# UCI Concrete
python -m src.ar.run --config configs/concrete_reg.yaml

# UCI Airfoil
python -m src.ar.run --config configs/airfoil_reg.yaml

Classification

# MNIST
python -m src.ar.run --config configs/mnist_cls.yaml

# CIFAR-10
python -m src.ar.run --config configs/cifar10_cls.yaml

⏱️ 1-minute sanity (small slice)

python -m src.ar.run --config configs/mnist_cls.yaml \
  --only_seed 0 --only_optimizer adam --add_baseline_zero

⚙️ Configuration & common flags

  • All experiment settings live in configs/*.yaml (dataset, model, optimizer, λ schedule, logging).

  • Useful CLI flags:

    • --only_seed, --only_fraction, --only_optimizer

    • --add_baseline_zero – evaluate λ=0 baseline under the same condition

    • Ablations:

      • no_trust_region (disable projection safeguard)
      • no_grad_clip (disable gradient clipping)
      • l2 (replace AR with positive L2 / Tikhonov regularization)
      • constant_lambda (no decay; fixed λ = λ0)
Configuration knobs (click to expand)
  • λ0 grid, α power-law decay (sample-size–dependent)
  • Safety: projection operator (trust-region radius), gradient clipping
  • Diagnostics: output-scale ratio (ρ), clipping rate (r_clip), projection rate (r_proj)

🛡️ Safety & diagnostics

AR is wrapped with lightweight safeguards:

  • Projection (trust-region constraint): if a step exceeds the trust-region radius, we project parameters back (a projection operator).
  • Gradient clipping: classic clipping to prevent exploding updates.

Logged diagnostics (per run):

  • output-scale ratio (ρ) – AR vs baseline output-norm ratio
  • clipping rate (r_clip) – fraction of updates affected by clipping
  • projection rate (r_proj) – fraction of updates corrected by projection

Use these to verify safety (e.g., keep ρ near 1 and moderate r_clip / r_proj).


📈 Schedules & rules of thumb

  • Power-law decay schedule: |λ(n)| = |λ(n0)| (n0/n)^α

    • Regression: set α ≈ 1
    • Classification: set α ≈ 0.5 (conservative ≥ 0.5)
  • (Optional) DoF targeting: keep per-sample complexity roughly constant by controlling tr(S_λ)/n.

These heuristics balance bias–variance and help ensure convergence and stable generalization.


🧪 Reproduce paper experiments

Run the four main configs to reproduce headline results:

python -m src.ar.run --config configs/concrete_reg.yaml
python -m src.ar.run --config configs/airfoil_reg.yaml
python -m src.ar.run --config configs/mnist_cls.yaml
python -m src.ar.run --config configs/cifar10_cls.yaml
  • Outputs: experiments/<run-id>/... and aggregated CSVs in results/*.csv.

  • Datasets:

    • UCI Concrete: data/raw/concrete/Concrete_Data.xls
    • UCI Airfoil: data/raw/airfoil/airfoil_self_noise.dat
    • MNIST / CIFAR-10: auto-downloaded via torchvision

✅ Tests

pytest -q

CI validates loaders, config schemas, and logging. PRs must pass CI.


🤝 Contributing

Issues and PRs are welcome. Suggested contributions: new decay schedules, optimizer studies, safety diagnostics, or additional datasets.


📚 Citation

@article{kim2025convergence,
  title={Convergence and Generalization of Anti-Regularization for Parametric Models},
  author={Kim, Dongseok and Jeong, Wonjun and Oh, Gisung},
  journal={arXiv preprint arXiv:2508.17412},
  year={2025}
}

Also see CITATION.cff in this repository.


📝 License

This project is released under the terms of the license in LICENSE.

About

Reproducibility code for the paper "Convergence and Generalization of Anti-Regularization for Parametric Models".

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages