A research tool for generating adversarially perturbed license plate overlays on vehicle images, producing structured datasets with reproducibility, transparency, and ethical guardrails.
- Primary Users: researchers, ML practitioners, privacy/robustness teams.
- Design Principle: user-first, safe by default, hackable by experts.
-
Default Config File:
config.yaml -
Configurable fields:
dataset: backgrounds: "./backgrounds" overlays: "./overlays" output: "./dataset" n_variants: 10 random_seed: 1337 perturbations: - name: shapes params: num_shapes: 20 min_size: 2 max_size: 15 - name: noise params: intensity: 25 logging: level: "INFO" save_metadata: true
-
Config hierarchy:
- Defaults (baked-in)
config.yaml(if present)- CLI overrides (
--n_variants 50)
Python package should expose:
from plateshapez import DatasetGenerator
from plateshapez.perturbations import register, PERTURBATION_REGISTRY
# Run programmatically
gen = DatasetGenerator(
bg_dir="backgrounds",
overlay_dir="overlays",
out_dir="dataset",
perturbations=[
{"name": "shapes", "params": {"num_shapes": 30}},
{"name": "noise", "params": {"intensity": 10}}
]
)
gen.run(n_variants=5)API surfaces:
DatasetGenerator– orchestrates dataset creation.Perturbationbase class – allows custom perturbations.registerdecorator – adds new perturbations into registry.load_config(path)– parse YAML/JSON configs.
Command entrypoint: advplate
Usage: advplate [OPTIONS] COMMAND [ARGS]...
### CLI Options
- `--config PATH` - Path to YAML/JSON configuration file
- `--n_variants INT` - Override number of variants per image pair
- `--seed INT` - Random seed for reproducible results (maps to `dataset.random_seed`)
- `--verbose` - Enable verbose logging
- `--debug` - Enable debug logging with full stack traces
- `--dry-run` - Preview generation plan without creating files
- `--as FORMAT` - Output format for info command (json|yaml) info
#### `generate`
```bash
plateshapez generate --backgrounds ./bg --overlays ./plates --out ./dataset --n_variants 20- Displays progress bar for images generated.
- Prints table of applied perturbations with Rich.
plateshapez list- Shows registered perturbations with docstrings, e.g.:
Available Perturbations:
────────────────────────────────────
shapes Random rectangles, ellipses, triangles
noise Add Gaussian or salt noise
warp Mild geometric warping
texture Overlay texture maps
- Errors always show help menu for current command.
- Empty input → display usage guide, not just crash.
- Rich-powered panels and tables for readability.
--dry-runmode prints what would be generated without writing files.
Use pytest.
-
Perturbation registry:
- Adding new perturbation.
- Duplicate name raises error.
-
Perturbation correctness:
- Shapes draw inside bounds.
- Noise intensity measurable.
-
Pipeline:
- Generates expected number of images.
- Metadata file contains valid JSON with correct keys.
-
Run
DatasetGeneratorwith a small config (tiny bg + plate) and validate:- Outputs exist.
- Metadata matches parameters.
- CLI returns success codes.
-
README.mdwith quickstart. -
DATASET_CARD.md(ethical + responsible use). -
docs/folder with:- Usage examples (CLI + API).
- Adding new perturbations.
- Config reference.
Using Rich + Typer:
import typer
from rich.console import Console
from rich.table import Table
from plateshapez.pipeline import DatasetGenerator
from plateshapez.config import load_config
app = typer.Typer(add_completion=False)
console = Console()
@app.command()
def list():
"""List available perturbations."""
from plateshapez.perturbations.base import PERTURBATION_REGISTRY
table = Table(title="Available Perturbations")
table.add_column("Name", style="cyan")
table.add_column("Description", style="green")
for name, cls in PERTURBATION_REGISTRY.items():
table.add_row(name, cls.__doc__ or "")
console.print(table)
@app.command()
def generate(
config: str = typer.Option(None, "--config", "-c"),
n_variants: int = typer.Option(None, "--n_variants")
):
"""Generate adversarial dataset."""
cfg = load_config(config, cli_overrides={"n_variants": n_variants})
gen = DatasetGenerator(
cfg["dataset"]["backgrounds"],
cfg["dataset"]["overlays"],
cfg["dataset"]["output"],
cfg["perturbations"]
)
gen.run(cfg["dataset"]["n_variants"])
console.print("[bold green]✓ Dataset generated successfully![/]")
def main():
try:
app()
except typer.Exit:
raise
except Exception as e:
console.print(f"[red]Error: {e}[/]")
typer.echo(app.get_help()) # show help menu always
raise typer.Exit(1)
if __name__ == "__main__":
main()- Core library (
advplate/) - CLI (
advplate/__main__.py) - Unit + integration tests
- Rich-based UX
- Dataset Card
- Examples
- Workflow file:
.github/workflows/ci.yml - Goals: fast, deterministic, identical behavior to local checks and pre-commit hooks.
- Key steps:
- Checkout repository
- Ensure
uvis available (self-hosted agent provides it; otherwise install) uv sync --group devto install tooling frompyproject.toml- Run
./scripts/check.sh, which executes in order:uv run ruff format .uv run ruff check . --fixuv run mypy .
- Parity:
- Local
pre-commituses the same tools viauvto avoid version drift. - Hooks stages are modernized to
[pre-commit, pre-push]in.pre-commit-config.yaml.
- Local
- Runner:
- Can run on a self-hosted runner (e.g., label
plateshapez) orubuntu-latestwith a uv install step.
- Can run on a self-hosted runner (e.g., label
- Artifacts/Logs:
- CI surfaces formatter/linter/type-checker output directly in logs for quick diagnosis.
- Modern typing syntax:
- Prefer builtin generics and PEP 604 unions:
- Use
list[str],dict[str, int],set[str], etc. instead ofList[str],Dict[str, int]. - Use
str | Noneinstead ofOptional[str]. - Use
A | Binstead ofUnion[A, B]. - Use default
Noneasparam: T | None = None(notOptional[T]).
- Use
- Prefer builtin generics and PEP 604 unions:
- Imports:
- Keep imports at file top. No mid-file imports (enforced by review and ruff rules where applicable).
- Naming:
- Avoid single-letter variables and abbreviations (common in ML, but hurts debuggability).
- Use descriptive, intention-revealing names (e.g.,
image_height,num_shapes,overlay_path).
- Formatting:
ruff formatis the canonical formatter. CI and hooks will enforce it.- Line length target: 100 (see
pyproject.toml).
- Rich logging:
- Provide a
--debugflag (and/or environment toggle) that enables verbose logs. - Debug mode includes module,
file_path:line, and trace-friendly context to speed up issue isolation. - Prefer structured and human-readable messages; avoid cryptic short forms.
- Provide a
- Error handling:
- Exceptions should surface meaningful context; where safe, include parameters that led to the error.
- Add targeted log lines at critical pipeline steps: loading config, enumerating inputs, applying perturbations, writing outputs/metadata.
- Principles:
- Menus should be informative and aesthetically clear (Rich tables/panels where helpful).
- When a user error occurs (missing arg, bad path, invalid value), the CLI must display the relevant help/usage inline.
- Do not merely say “see --help”; show the specific command’s synopsis and key options immediately.
- Empty invocations should show usage guidance rather than failing silently or with a stack trace.
- Provide
--dry-runto preview outputs and side effects without writing files.
- Discoverability:
- Include
listandinfocommands to help users explore available perturbations and current configuration.
- Include
- Consistency:
- Ensure CLI messaging mirrors the same terminology as the API and documentation.