refactor(tools): generalize infer.py to accept any registered model by idchacon28 · Pull Request #10 · microsoft/MegaDetector-Overhead

idchacon28 · 2026-06-05T19:31:04Z

Generalize `tools/infer.py` to accept any registered model

Why

tools/infer.py was hardcoded to the legacy multi-class HerdNet model:

from animaloc.models import LossWrapper, HerdNet
from animaloc.eval import HerdNetStitcher, HerdNetEvaluator
assert num_classes == 7 (literal)
hardcoded classes = {1:'class_1', ..., 6:'class_6'}

Result: single-class OWL models (OWLC, OWLT, OWLD_S/B/L/H) could not run inference through this script. The docs routed users to tools/test.py, which requires ground-truth annotations — useless for actual inference on new imagery.

What

Two commits do the work, plus tests and docs:

Commit	What
`feat(registry): add animaloc/registry/families.py`	New `FAMILIES` table mapping each registered model to its stitcher, evaluator, default `model_kwargs`, normalization stats, `down_ratio`, and `multi_class` flag. Plus a `resolve_family(name, *, checkpoint_meta, overrides)` helper. Lives under `animaloc/registry/` so any future tool can import it without dragging eval deps into `animaloc.models`.
`refactor(tools): generalize infer.py`	Full rewrite. Replaces all HerdNet hardcoding with family-driven lookups. New CLI flags. Backwards-compatible default (`--model HerdNet`).
`test(smoke): add tests/smoke_infer.sh`	Bash one-shot that ensures the synthetic dataset + an OWL-C checkpoint exist, runs `tools/infer.py --model OWLC`, and verifies the CSV schema.
`docs(infer): update training.md`	Replaces the "for OWL inference use test.py" workaround with the real recipe.

New CLI surface

python tools/infer.py <images_dir> <model.pth>                # back-compat: HerdNet
python tools/infer.py <images_dir> <model.pth> --model OWLC
python tools/infer.py <images_dir> <model.pth> --model OWLD_L -device cpu \
    --output-dir /tmp/owld_l_results
python tools/infer.py <images_dir> <model.pth> --model OWLT \
    --model-kwarg down_ratio=4

Full flag set: --model, --model-kwarg KEY=VAL (repeatable), --stitcher, --evaluator, --num-classes, --mean, --std, --down-ratio, --lmds-kernel-size, --lmds-adapt-ts, --lmds-neg-ts, --output-dir. Plus the existing -size -over -device -pf -rot --skip-model-inference.

Resolution order (per setting)

FAMILIES[name] defaults
Checkpoint metadata (classes, mean, std saved by tools/train.py)
Explicit CLI override

model_kwargs is merged rather than replaced, so users override one kwarg without listing every default.

Behavior changes that matter

Output directory is configurable (--output-dir) and the default folder name now mentions the model (20260605_OWLC_results, not the hardcoded _HerdNet_results). HerdNet default location is unchanged.
assert num_classes == 7 is gone. For HerdNet without classes metadata, the layer-shape probe (model.cls_head.2.weight) is kept as a last-resort fallback. OWL families don't take num_classes.
state_dict loading is now strict=False with explicit missing/unexpected key warnings (catches partial-load checkpoints; tells the user what happened).
The .map(classes) + .dropna() chain is gone. Detection rows whose label is unmapped now keep the raw label (as string) in species and emit one warning listing unmapped labels.
pretrained=False is set in every family's model_kwargs so the constructor never re-fetches DINOv3 or DLA-34 weights at inference time. The checkpoint's state_dict supersedes them.

Sanity checks added at startup

One-shot dummy forward on torch.zeros(1, 3, size, size) to detect model/stitcher shape mismatches early. Unwraps LossWrapper.forward's (output, output_dict) tuple and ignores None entries in tuple outputs (e.g. OWLD_S returns (heatmap, None)).

Smoke validation (CPU, Python 3.11)

Test	Result
`tools/infer.py val/ <OWLC_ckpt> --model OWLC -device cpu`	✅ 1856 detections written to CSV
`tools/infer.py val/ <OWLD_S_ckpt> --model OWLD_S -device cpu`	✅ 2270 detections; full DINOv3 + animaloc end-to-end
`tools/infer.py val/ <OWLC_ckpt>` (default HerdNet against OWL ckpt)	✅ fails cleanly with "4 missing key(s)" warning + actionable LMDS error
`./tests/smoke_infer.sh`	✅ passes (290 detections, schema OK)
`uv run mkdocs build --strict`	✅ no warnings

Deferred to follow-up PRs

Clean up the duplicate class definitions in animaloc/eval/evaluators.py (the whole file is pasted twice; verified the first-half classes are operationally bound, but it's a mess).
Add model_name / stitcher_name / evaluator_name to tools/train.py's checkpoint metadata so --model could become fully auto-detected.
Factor out a pure inference path that doesn't use Evaluator with dummy ground truth.

Backwards compatibility

Old usage	After this PR
`tools/infer.py images/ ckpt.pth`	Still works. Defaults to `--model HerdNet`.
`-size 512 -over 160 -device cuda` (etc.)	Unchanged.
`_HerdNet_results` output folder for HerdNet	Unchanged (still uses `_HerdNet_` since `--model` defaults to HerdNet).

The only intentional break is the assert num_classes == 7. Any legacy HerdNet checkpoint with a different num_classes will now actually load.

Branch

feat/generalize-infer — 4 commits, all conventional-commits style:

8347984 docs(infer): update training.md for the generalized tools/infer.py
85b66c0 test(smoke): add tests/smoke_infer.sh + extend tests/README
0ab5abc refactor(tools): generalize infer.py to support any registered model
8bb1d25 feat(registry): add animaloc/registry/families.py for deployment defaults

How to test locally

gh pr checkout <this-PR-number>
uv sync
uv run python tools/infer.py --help     # see the new CLI
./tests/smoke_infer.sh                  # end-to-end smoke test

…ults Introduces a single source of truth for per-model deployment defaults (stitcher, evaluator, model_kwargs, normalization stats, down_ratio, multi_class flag) that tools like `tools/infer.py` need but should not hard-code. Placed under `animaloc/registry/` rather than `animaloc/models/` so the model classes themselves do not pick up an accidental dependency on eval components. Includes entries for the seven registered models in this repo: HerdNet, OWLC, OWLT, OWLD_S, OWLD_B, OWLD_L, OWLD_H The `resolve_family(name, *, checkpoint_meta, overrides)` helper returns the effective config with this resolution order: family defaults -> checkpoint metadata (mean/std/classes saved by tools/train.py) -> explicit CLI overrides (with model_kwargs merged, not replaced). Notable design choices: * All OWL families set `pretrained=False` (HerdNet, OWLD_*) and `pretrained_cnn=False` (OWLT) so inference does not re-fetch backbone weights -- the checkpoint state_dict supersedes them. * Normalization defaults to ImageNet stats. Verified against every OWL training config in this repo (incl. all DINOv3 ViT runs); the user trains with ImageNet stats throughout. Smoke-tested: - All 7 family names resolve. - resolve_family() with checkpoint_meta + overrides correctly merges model_kwargs and overrides scalar fields. - Unknown family name raises KeyError with an actionable message listing known families and the file to edit. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

`tools/infer.py` was hardcoded to the legacy multi-class HerdNet model: hardcoded `from animaloc.models import HerdNet`, hardcoded HerdNet stitcher/evaluator, a non-recoverable `assert num_classes == 7`, and a hardcoded `classes={1:..6:..}` dict. Single-class OWL models (OWLC, OWLT, OWLD_S/B/L/H) could not run inference through this script -- the docs routed users to `tools/test.py` (which needs ground-truth annotations) as a workaround. This commit rewrites infer.py to look up the model class, stitcher, evaluator, default kwargs, and normalization stats from the `animaloc.registry.families.FAMILIES` table added in the previous commit. Any registered model that has a family entry is usable. ## New CLI surface (all flags optional except positional root + pth) --model NAME from FAMILIES keys; default HerdNet (back-compat) --model-kwarg KEY=VAL override a constructor kwarg (repeatable) --stitcher NAME override family-default stitcher class --evaluator NAME override family-default evaluator class --num-classes N explicit override (HerdNet without metadata only) --mean R,G,B override normalization mean --std R,G,B override normalization std --down-ratio N override down_ratio --lmds-kernel-size H,W LMDS kernel (default 3,3) --lmds-adapt-ts FLOAT LMDS adaptive threshold (default 0.2) --lmds-neg-ts FLOAT LMDS negative threshold (HerdNet family only) --output-dir PATH default <root>/<date>_<model>_results -size -over -device -pf -rot --skip-model-inference (unchanged) ## Resolution order For each setting: FAMILIES[name] defaults -> checkpoint metadata (`classes`, `mean`, `std`) -> explicit CLI override. `model_kwargs` is merged rather than replaced so users override one kwarg without listing every default. ## Behavior changes that matter * Output dir is configurable (`--output-dir`) and the folder name now includes the model (e.g. `20260605_OWLC_results`), not the hardcoded `_HerdNet_results`. Default location is unchanged for HerdNet. * `assert num_classes == 7` is gone. For HerdNet without `classes` metadata, the layer-shape probe (`model.cls_head.2.weight`) is kept as a last-resort fallback. For OWL families, num_classes is not passed (the constructor doesn't accept it). * `state_dict` loading is now `strict=False` with explicit missing/unexpected key warnings. Catches partial-load checkpoints without crashing immediately, but tells the user what happened. * `.map(classes) + .dropna()` chain is gone. Detection rows whose label is unmapped now keep the raw label (as string) in `species` and emit a single warning listing unmapped labels. * `pretrained=False` is set in every family's `model_kwargs` so the constructor never re-fetches DINOv3 or DLA-34 weights at inference time (the checkpoint's state_dict supersedes them). ## Sanity checks added at startup * One-shot dummy forward on `torch.zeros(1, 3, size, size)` to detect model/stitcher shape mismatches early with a clear error instead of a deep tuple-unpack failure inside LMDS. * Unwrap `LossWrapper`'s `(output, output_dict)` and ignore `None` entries in tuple outputs (e.g. OWLD_S returns `(heatmap, None)`) before counting outputs. ## Smoke validation * `tools/infer.py /tmp/owl-smoketest/val/ <OWLC_ckpt> --model OWLC -device cpu` produces a 1856-row detections.csv with columns `images, labels, dscores, x, y, count_1, species`. * `tools/infer.py /tmp/owl-smoketest/val/ <OWLD_S_ckpt> --model OWLD_S -device cpu` produces 2270 rows via the DINOv3 ViT-S/16 frozen backbone. End-to-end DINOv3 + animaloc inference works. * `tools/infer.py /tmp/owl-smoketest/val/ <OWLC_ckpt>` (no --model, defaults to HerdNet against an OWL checkpoint) fails cleanly with "4 missing key(s) in state_dict" warning + LMDS shape error. ## Deferred to follow-up * The Evaluator path is still used as a wrapper because it already implements stitching + LMDS. Ground-truth values are dummy (x=0, y=0, label=1) and metrics are discarded. A future PR can factor out a pure inference function that does not go through Evaluator. * Adding `model_name` / `stitcher_name` / `evaluator_name` to the checkpoint metadata in `tools/train.py` so `--model` becomes fully auto-detected. Today the user still has to pass `--model` for non-HerdNet checkpoints. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

The OWL-C training and evaluation smoke runs were already in tests/README.md. This commit adds the inference smoke run that exercises the newly generalized tools/infer.py. `tests/smoke_infer.sh`: 1. Generates the synthetic dataset at /tmp/owl-smoketest/ if missing 2. Runs the OWL-C training smoke if no checkpoint exists under outputs/ 3. Runs `tools/infer.py /tmp/owl-smoketest/val/ <ckpt> --model OWLC -device cpu --output-dir /tmp/owl-smoketest-infer/` 4. Verifies the detections CSV exists, has > 0 rows, and contains the columns `images, x, y, labels` Exit code 0 on pass, non-zero on any failure. Runs in ~30 seconds on CPU when the training smoke has already produced a checkpoint, or ~90 seconds if it has to train one first. `tests/README.md`: - Adds step 6 (./tests/smoke_infer.sh) to the smoke-test sequence - Documents what the inference smoke script does and what it verifies Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Replaces the previous "tools/infer.py runs the original HerdNet model end-to-end ... For OWL-C / OWL-D / OWL-T inference, use tools/test.py" note with the real story: tools/infer.py now accepts --model <name> for any registered model, including the OWL family. Adds: - Per-model invocation examples (HerdNet default, OWLC, OWLD_L, OWLT with --model-kwarg override) - List of supported --model values - Pointer to `uv run python tools/infer.py --help` for the full flag set - Pointer to `animaloc/registry/families.py` for adding new model families Also updates the "Verifying the install (smoke tests)" section to include `./tests/smoke_infer.sh` as step 5. Verified `uv run mkdocs build --strict` still succeeds. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

microsoft-github-policy-service · 2026-06-05T19:31:14Z

@idchacon28 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

(default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
(when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"

Contributor License Agreement

Contribution License Agreement

This Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
and conveys certain license rights to Microsoft Corporation and its affiliates (“Microsoft”) for Your
contributions to Microsoft open source projects. This Agreement is effective as of the latest signature
date below.

Definitions.
“Code” means the computer software code, whether in human-readable or machine-executable form,
that is delivered by You to Microsoft under this Agreement.
“Project” means any of the projects owned or managed by Microsoft and offered under a license
approved by the Open Source Initiative (www.opensource.org).
“Submit” is the act of uploading, submitting, transmitting, or distributing code or other content to any
Project, including but not limited to communication on electronic mailing lists, source code control
systems, and issue tracking systems that are managed by, or on behalf of, the Project for the purpose of
discussing and improving that Project, but excluding communication that is conspicuously marked or
otherwise designated in writing by You as “Not a Submission.”
“Submission” means the Code and any other copyrightable material Submitted by You, including any
associated comments and documentation.
Your Submission. You must agree to the terms of this Agreement before making a Submission to any
Project. This Agreement covers any and all Submissions that You, now or in the future (except as
described in Section 4 below), Submit to any Project.
Originality of Work. You represent that each of Your Submissions is entirely Your original work.
Should You wish to Submit materials that are not Your original work, You may Submit them separately
to the Project if You (a) retain all copyright and license information that was in the materials as You
received them, (b) in the description accompanying Your Submission, include the phrase “Submission
containing materials of a third party:” followed by the names of the third party and any licenses or other
restrictions of which You are aware, and (c) follow any other instructions in the Project’s written
guidelines concerning Submissions.
Your Employer. References to “employer” in this Agreement include Your employer or anyone else
for whom You are acting in making Your Submission, e.g. as a contractor, vendor, or agent. If Your
Submission is made in the course of Your work for an employer or Your employer has intellectual
property rights in Your Submission by contract or applicable law, You must secure permission from Your
employer to make the Submission before signing this Agreement. In that case, the term “You” in this
Agreement will refer to You and the employer collectively. If You change employers in the future and
desire to Submit additional Submissions for the new employer, then You agree to sign a new Agreement
and secure permission from the new employer before Submitting those Submissions.
Licenses.

Copyright License. You grant Microsoft, and those who receive the Submission directly or
indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license in the
Submission to reproduce, prepare derivative works of, publicly display, publicly perform, and distribute
the Submission and such derivative works, and to sublicense any or all of the foregoing rights to third
parties.
Patent License. You grant Microsoft, and those who receive the Submission directly or
indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license under
Your patent claims that are necessarily infringed by the Submission or the combination of the
Submission with the Project to which it was Submitted to make, have made, use, offer to sell, sell and
import or otherwise dispose of the Submission alone or with the Project.
Other Rights Reserved. Each party reserves all rights not expressly granted in this Agreement.
No additional licenses or rights whatsoever (including, without limitation, any implied licenses) are
granted by implication, exhaustion, estoppel or otherwise.

Representations and Warranties. You represent that You are legally entitled to grant the above
licenses. You represent that each of Your Submissions is entirely Your original work (except as You may
have disclosed under Section 3). You represent that You have secured permission from Your employer to
make the Submission in cases where Your Submission is made in the course of Your work for Your
employer or Your employer has intellectual property rights in Your Submission by contract or applicable
law. If You are signing this Agreement on behalf of Your employer, You represent and warrant that You
have the necessary authority to bind the listed employer to the obligations contained in this Agreement.
You are not expected to provide support for Your Submission, unless You choose to do so. UNLESS
REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING, AND EXCEPT FOR THE WARRANTIES
EXPRESSLY STATED IN SECTIONS 3, 4, AND 6, THE SUBMISSION PROVIDED UNDER THIS AGREEMENT IS
PROVIDED WITHOUT WARRANTY OF ANY KIND, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY OF
NONINFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
Notice to Microsoft. You agree to notify Microsoft in writing of any facts or circumstances of which
You later become aware that would make Your representations in this Agreement inaccurate in any
respect.
Information about Submissions. You agree that contributions to Projects and information about
contributions may be maintained indefinitely and disclosed publicly, including Your name and other
information that You submit with Your Submission.
Governing Law/Jurisdiction. This Agreement is governed by the laws of the State of Washington, and
the parties consent to exclusive jurisdiction and venue in the federal courts sitting in King County,
Washington, unless no federal subject matter jurisdiction exists, in which case the parties consent to
exclusive jurisdiction and venue in the Superior Court of King County, Washington. The parties waive all
defenses of lack of personal jurisdiction and forum non-conveniens.
Entire Agreement/Assignment. This Agreement is the entire agreement between the parties, and
supersedes any and all prior agreements, understandings or communications, written or oral, between
the parties relating to the subject matter hereof. This Agreement may be assigned by Microsoft.

idchacon28 and others added 4 commits June 5, 2026 19:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(tools): generalize infer.py to accept any registered model#10

refactor(tools): generalize infer.py to accept any registered model#10
idchacon28 wants to merge 4 commits into
mainfrom
feat/generalize-infer

idchacon28 commented Jun 5, 2026

Uh oh!

microsoft-github-policy-service Bot commented Jun 5, 2026

Contribution License Agreement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

idchacon28 commented Jun 5, 2026

Generalize tools/infer.py to accept any registered model

Why

What

New CLI surface

Resolution order (per setting)

Behavior changes that matter

Sanity checks added at startup

Smoke validation (CPU, Python 3.11)

Deferred to follow-up PRs

Backwards compatibility

Branch

How to test locally

Uh oh!

microsoft-github-policy-service Bot commented Jun 5, 2026

Contribution License Agreement

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Generalize `tools/infer.py` to accept any registered model