Skip to content

refactor(tools): generalize infer.py to accept any registered model#10

Draft
idchacon28 wants to merge 4 commits into
mainfrom
feat/generalize-infer
Draft

refactor(tools): generalize infer.py to accept any registered model#10
idchacon28 wants to merge 4 commits into
mainfrom
feat/generalize-infer

Conversation

@idchacon28

Copy link
Copy Markdown
Collaborator

Generalize tools/infer.py to accept any registered model

Why

tools/infer.py was hardcoded to the legacy multi-class HerdNet model:

  • from animaloc.models import LossWrapper, HerdNet
  • from animaloc.eval import HerdNetStitcher, HerdNetEvaluator
  • assert num_classes == 7 (literal)
  • hardcoded classes = {1:'class_1', ..., 6:'class_6'}

Result: single-class OWL models (OWLC, OWLT, OWLD_S/B/L/H) could not run inference through this script. The docs routed users to tools/test.py, which requires ground-truth annotations — useless for actual inference on new imagery.

What

Two commits do the work, plus tests and docs:

Commit What
feat(registry): add animaloc/registry/families.py New FAMILIES table mapping each registered model to its stitcher, evaluator, default model_kwargs, normalization stats, down_ratio, and multi_class flag. Plus a resolve_family(name, *, checkpoint_meta, overrides) helper. Lives under animaloc/registry/ so any future tool can import it without dragging eval deps into animaloc.models.
refactor(tools): generalize infer.py Full rewrite. Replaces all HerdNet hardcoding with family-driven lookups. New CLI flags. Backwards-compatible default (--model HerdNet).
test(smoke): add tests/smoke_infer.sh Bash one-shot that ensures the synthetic dataset + an OWL-C checkpoint exist, runs tools/infer.py --model OWLC, and verifies the CSV schema.
docs(infer): update training.md Replaces the "for OWL inference use test.py" workaround with the real recipe.

New CLI surface

python tools/infer.py <images_dir> <model.pth>                # back-compat: HerdNet
python tools/infer.py <images_dir> <model.pth> --model OWLC
python tools/infer.py <images_dir> <model.pth> --model OWLD_L -device cpu \
    --output-dir /tmp/owld_l_results
python tools/infer.py <images_dir> <model.pth> --model OWLT \
    --model-kwarg down_ratio=4

Full flag set: --model, --model-kwarg KEY=VAL (repeatable), --stitcher, --evaluator, --num-classes, --mean, --std, --down-ratio, --lmds-kernel-size, --lmds-adapt-ts, --lmds-neg-ts, --output-dir. Plus the existing -size -over -device -pf -rot --skip-model-inference.

Resolution order (per setting)

  1. FAMILIES[name] defaults
  2. Checkpoint metadata (classes, mean, std saved by tools/train.py)
  3. Explicit CLI override

model_kwargs is merged rather than replaced, so users override one kwarg without listing every default.

Behavior changes that matter

  • Output directory is configurable (--output-dir) and the default folder name now mentions the model (20260605_OWLC_results, not the hardcoded _HerdNet_results). HerdNet default location is unchanged.
  • assert num_classes == 7 is gone. For HerdNet without classes metadata, the layer-shape probe (model.cls_head.2.weight) is kept as a last-resort fallback. OWL families don't take num_classes.
  • state_dict loading is now strict=False with explicit missing/unexpected key warnings (catches partial-load checkpoints; tells the user what happened).
  • The .map(classes) + .dropna() chain is gone. Detection rows whose label is unmapped now keep the raw label (as string) in species and emit one warning listing unmapped labels.
  • pretrained=False is set in every family's model_kwargs so the constructor never re-fetches DINOv3 or DLA-34 weights at inference time. The checkpoint's state_dict supersedes them.

Sanity checks added at startup

  • One-shot dummy forward on torch.zeros(1, 3, size, size) to detect model/stitcher shape mismatches early. Unwraps LossWrapper.forward's (output, output_dict) tuple and ignores None entries in tuple outputs (e.g. OWLD_S returns (heatmap, None)).

Smoke validation (CPU, Python 3.11)

Test Result
tools/infer.py val/ <OWLC_ckpt> --model OWLC -device cpu ✅ 1856 detections written to CSV
tools/infer.py val/ <OWLD_S_ckpt> --model OWLD_S -device cpu ✅ 2270 detections; full DINOv3 + animaloc end-to-end
tools/infer.py val/ <OWLC_ckpt> (default HerdNet against OWL ckpt) ✅ fails cleanly with "4 missing key(s)" warning + actionable LMDS error
./tests/smoke_infer.sh ✅ passes (290 detections, schema OK)
uv run mkdocs build --strict ✅ no warnings

Deferred to follow-up PRs

  1. Clean up the duplicate class definitions in animaloc/eval/evaluators.py (the whole file is pasted twice; verified the first-half classes are operationally bound, but it's a mess).
  2. Add model_name / stitcher_name / evaluator_name to tools/train.py's checkpoint metadata so --model could become fully auto-detected.
  3. Factor out a pure inference path that doesn't use Evaluator with dummy ground truth.

Backwards compatibility

Old usage After this PR
tools/infer.py images/ ckpt.pth Still works. Defaults to --model HerdNet.
-size 512 -over 160 -device cuda (etc.) Unchanged.
_HerdNet_results output folder for HerdNet Unchanged (still uses _HerdNet_ since --model defaults to HerdNet).

The only intentional break is the assert num_classes == 7. Any legacy HerdNet checkpoint with a different num_classes will now actually load.

Branch

feat/generalize-infer — 4 commits, all conventional-commits style:

8347984 docs(infer): update training.md for the generalized tools/infer.py
85b66c0 test(smoke): add tests/smoke_infer.sh + extend tests/README
0ab5abc refactor(tools): generalize infer.py to support any registered model
8bb1d25 feat(registry): add animaloc/registry/families.py for deployment defaults

How to test locally

gh pr checkout <this-PR-number>
uv sync
uv run python tools/infer.py --help     # see the new CLI
./tests/smoke_infer.sh                  # end-to-end smoke test

idchacon28 and others added 4 commits June 5, 2026 19:21
…ults

Introduces a single source of truth for per-model deployment defaults
(stitcher, evaluator, model_kwargs, normalization stats, down_ratio,
multi_class flag) that tools like `tools/infer.py` need but should not
hard-code.

Placed under `animaloc/registry/` rather than `animaloc/models/` so the
model classes themselves do not pick up an accidental dependency on
eval components.

Includes entries for the seven registered models in this repo:
  HerdNet, OWLC, OWLT, OWLD_S, OWLD_B, OWLD_L, OWLD_H

The `resolve_family(name, *, checkpoint_meta, overrides)` helper
returns the effective config with this resolution order:
  family defaults  ->  checkpoint metadata (mean/std/classes saved by
  tools/train.py)  ->  explicit CLI overrides (with model_kwargs
  merged, not replaced).

Notable design choices:
* All OWL families set `pretrained=False` (HerdNet, OWLD_*) and
  `pretrained_cnn=False` (OWLT) so inference does not re-fetch
  backbone weights -- the checkpoint state_dict supersedes them.
* Normalization defaults to ImageNet stats. Verified against every
  OWL training config in this repo (incl. all DINOv3 ViT runs); the
  user trains with ImageNet stats throughout.

Smoke-tested:
  - All 7 family names resolve.
  - resolve_family() with checkpoint_meta + overrides correctly merges
    model_kwargs and overrides scalar fields.
  - Unknown family name raises KeyError with an actionable message
    listing known families and the file to edit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
`tools/infer.py` was hardcoded to the legacy multi-class HerdNet model:
hardcoded `from animaloc.models import HerdNet`, hardcoded HerdNet
stitcher/evaluator, a non-recoverable `assert num_classes == 7`, and a
hardcoded `classes={1:..6:..}` dict. Single-class OWL models (OWLC,
OWLT, OWLD_S/B/L/H) could not run inference through this script -- the
docs routed users to `tools/test.py` (which needs ground-truth
annotations) as a workaround.

This commit rewrites infer.py to look up the model class, stitcher,
evaluator, default kwargs, and normalization stats from the
`animaloc.registry.families.FAMILIES` table added in the previous
commit. Any registered model that has a family entry is usable.

## New CLI surface (all flags optional except positional root + pth)

  --model NAME            from FAMILIES keys; default HerdNet (back-compat)
  --model-kwarg KEY=VAL   override a constructor kwarg (repeatable)
  --stitcher NAME         override family-default stitcher class
  --evaluator NAME        override family-default evaluator class
  --num-classes N         explicit override (HerdNet without metadata only)
  --mean R,G,B            override normalization mean
  --std R,G,B             override normalization std
  --down-ratio N          override down_ratio
  --lmds-kernel-size H,W  LMDS kernel (default 3,3)
  --lmds-adapt-ts FLOAT   LMDS adaptive threshold (default 0.2)
  --lmds-neg-ts FLOAT     LMDS negative threshold (HerdNet family only)
  --output-dir PATH       default <root>/<date>_<model>_results
  -size -over -device -pf -rot --skip-model-inference  (unchanged)

## Resolution order

For each setting: FAMILIES[name] defaults  ->  checkpoint metadata
(`classes`, `mean`, `std`)  ->  explicit CLI override. `model_kwargs`
is merged rather than replaced so users override one kwarg without
listing every default.

## Behavior changes that matter

* Output dir is configurable (`--output-dir`) and the folder name now
  includes the model (e.g. `20260605_OWLC_results`), not the hardcoded
  `_HerdNet_results`. Default location is unchanged for HerdNet.
* `assert num_classes == 7` is gone. For HerdNet without `classes`
  metadata, the layer-shape probe (`model.cls_head.2.weight`) is kept
  as a last-resort fallback. For OWL families, num_classes is not
  passed (the constructor doesn't accept it).
* `state_dict` loading is now `strict=False` with explicit
  missing/unexpected key warnings. Catches partial-load checkpoints
  without crashing immediately, but tells the user what happened.
* `.map(classes) + .dropna()` chain is gone. Detection rows whose
  label is unmapped now keep the raw label (as string) in `species`
  and emit a single warning listing unmapped labels.
* `pretrained=False` is set in every family's `model_kwargs` so the
  constructor never re-fetches DINOv3 or DLA-34 weights at inference
  time (the checkpoint's state_dict supersedes them).

## Sanity checks added at startup

* One-shot dummy forward on `torch.zeros(1, 3, size, size)` to detect
  model/stitcher shape mismatches early with a clear error instead of
  a deep tuple-unpack failure inside LMDS.
* Unwrap `LossWrapper`'s `(output, output_dict)` and ignore `None`
  entries in tuple outputs (e.g. OWLD_S returns `(heatmap, None)`)
  before counting outputs.

## Smoke validation

* `tools/infer.py /tmp/owl-smoketest/val/ <OWLC_ckpt> --model OWLC
  -device cpu` produces a 1856-row detections.csv with columns
  `images, labels, dscores, x, y, count_1, species`.
* `tools/infer.py /tmp/owl-smoketest/val/ <OWLD_S_ckpt>
  --model OWLD_S -device cpu` produces 2270 rows via the DINOv3
  ViT-S/16 frozen backbone. End-to-end DINOv3 + animaloc inference
  works.
* `tools/infer.py /tmp/owl-smoketest/val/ <OWLC_ckpt>` (no --model,
  defaults to HerdNet against an OWL checkpoint) fails cleanly with
  "4 missing key(s) in state_dict" warning + LMDS shape error.

## Deferred to follow-up

* The Evaluator path is still used as a wrapper because it already
  implements stitching + LMDS. Ground-truth values are dummy (x=0,
  y=0, label=1) and metrics are discarded. A future PR can factor out
  a pure inference function that does not go through Evaluator.
* Adding `model_name` / `stitcher_name` / `evaluator_name` to the
  checkpoint metadata in `tools/train.py` so `--model` becomes fully
  auto-detected. Today the user still has to pass `--model` for
  non-HerdNet checkpoints.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The OWL-C training and evaluation smoke runs were already in
tests/README.md. This commit adds the inference smoke run that
exercises the newly generalized tools/infer.py.

`tests/smoke_infer.sh`:
  1. Generates the synthetic dataset at /tmp/owl-smoketest/ if missing
  2. Runs the OWL-C training smoke if no checkpoint exists under outputs/
  3. Runs `tools/infer.py /tmp/owl-smoketest/val/ <ckpt> --model OWLC
     -device cpu --output-dir /tmp/owl-smoketest-infer/`
  4. Verifies the detections CSV exists, has > 0 rows, and contains
     the columns `images, x, y, labels`

Exit code 0 on pass, non-zero on any failure. Runs in ~30 seconds on
CPU when the training smoke has already produced a checkpoint, or
~90 seconds if it has to train one first.

`tests/README.md`:
  - Adds step 6 (./tests/smoke_infer.sh) to the smoke-test sequence
  - Documents what the inference smoke script does and what it verifies

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the previous "tools/infer.py runs the original HerdNet model
end-to-end ... For OWL-C / OWL-D / OWL-T inference, use tools/test.py"
note with the real story: tools/infer.py now accepts --model <name>
for any registered model, including the OWL family.

Adds:
  - Per-model invocation examples (HerdNet default, OWLC, OWLD_L,
    OWLT with --model-kwarg override)
  - List of supported --model values
  - Pointer to `uv run python tools/infer.py --help` for the full
    flag set
  - Pointer to `animaloc/registry/families.py` for adding new model
    families

Also updates the "Verifying the install (smoke tests)" section to
include `./tests/smoke_infer.sh` as step 5.

Verified `uv run mkdocs build --strict` still succeeds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@microsoft-github-policy-service

Copy link
Copy Markdown

@idchacon28 please read the following Contributor License Agreement(CLA). If you agree with the CLA, please reply with the following information.

@microsoft-github-policy-service agree [company="{your company}"]

Options:

  • (default - no company specified) I have sole ownership of intellectual property rights to my Submissions and I am not making Submissions in the course of work for my employer.
@microsoft-github-policy-service agree
  • (when company given) I am making Submissions in the course of work for my employer (or my employer has intellectual property rights in my Submissions by contract or applicable law). I have permission from my employer to make Submissions and enter into this Agreement on behalf of my employer. By signing below, the defined term “You” includes me and my employer.
@microsoft-github-policy-service agree company="Microsoft"
Contributor License Agreement

Contribution License Agreement

This Contribution License Agreement (“Agreement”) is agreed to by the party signing below (“You”),
and conveys certain license rights to Microsoft Corporation and its affiliates (“Microsoft”) for Your
contributions to Microsoft open source projects. This Agreement is effective as of the latest signature
date below.

  1. Definitions.
    “Code” means the computer software code, whether in human-readable or machine-executable form,
    that is delivered by You to Microsoft under this Agreement.
    “Project” means any of the projects owned or managed by Microsoft and offered under a license
    approved by the Open Source Initiative (www.opensource.org).
    “Submit” is the act of uploading, submitting, transmitting, or distributing code or other content to any
    Project, including but not limited to communication on electronic mailing lists, source code control
    systems, and issue tracking systems that are managed by, or on behalf of, the Project for the purpose of
    discussing and improving that Project, but excluding communication that is conspicuously marked or
    otherwise designated in writing by You as “Not a Submission.”
    “Submission” means the Code and any other copyrightable material Submitted by You, including any
    associated comments and documentation.
  2. Your Submission. You must agree to the terms of this Agreement before making a Submission to any
    Project. This Agreement covers any and all Submissions that You, now or in the future (except as
    described in Section 4 below), Submit to any Project.
  3. Originality of Work. You represent that each of Your Submissions is entirely Your original work.
    Should You wish to Submit materials that are not Your original work, You may Submit them separately
    to the Project if You (a) retain all copyright and license information that was in the materials as You
    received them, (b) in the description accompanying Your Submission, include the phrase “Submission
    containing materials of a third party:” followed by the names of the third party and any licenses or other
    restrictions of which You are aware, and (c) follow any other instructions in the Project’s written
    guidelines concerning Submissions.
  4. Your Employer. References to “employer” in this Agreement include Your employer or anyone else
    for whom You are acting in making Your Submission, e.g. as a contractor, vendor, or agent. If Your
    Submission is made in the course of Your work for an employer or Your employer has intellectual
    property rights in Your Submission by contract or applicable law, You must secure permission from Your
    employer to make the Submission before signing this Agreement. In that case, the term “You” in this
    Agreement will refer to You and the employer collectively. If You change employers in the future and
    desire to Submit additional Submissions for the new employer, then You agree to sign a new Agreement
    and secure permission from the new employer before Submitting those Submissions.
  5. Licenses.
  • Copyright License. You grant Microsoft, and those who receive the Submission directly or
    indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license in the
    Submission to reproduce, prepare derivative works of, publicly display, publicly perform, and distribute
    the Submission and such derivative works, and to sublicense any or all of the foregoing rights to third
    parties.
  • Patent License. You grant Microsoft, and those who receive the Submission directly or
    indirectly from Microsoft, a perpetual, worldwide, non-exclusive, royalty-free, irrevocable license under
    Your patent claims that are necessarily infringed by the Submission or the combination of the
    Submission with the Project to which it was Submitted to make, have made, use, offer to sell, sell and
    import or otherwise dispose of the Submission alone or with the Project.
  • Other Rights Reserved. Each party reserves all rights not expressly granted in this Agreement.
    No additional licenses or rights whatsoever (including, without limitation, any implied licenses) are
    granted by implication, exhaustion, estoppel or otherwise.
  1. Representations and Warranties. You represent that You are legally entitled to grant the above
    licenses. You represent that each of Your Submissions is entirely Your original work (except as You may
    have disclosed under Section 3). You represent that You have secured permission from Your employer to
    make the Submission in cases where Your Submission is made in the course of Your work for Your
    employer or Your employer has intellectual property rights in Your Submission by contract or applicable
    law. If You are signing this Agreement on behalf of Your employer, You represent and warrant that You
    have the necessary authority to bind the listed employer to the obligations contained in this Agreement.
    You are not expected to provide support for Your Submission, unless You choose to do so. UNLESS
    REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING, AND EXCEPT FOR THE WARRANTIES
    EXPRESSLY STATED IN SECTIONS 3, 4, AND 6, THE SUBMISSION PROVIDED UNDER THIS AGREEMENT IS
    PROVIDED WITHOUT WARRANTY OF ANY KIND, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTY OF
    NONINFRINGEMENT, MERCHANTABILITY, OR FITNESS FOR A PARTICULAR PURPOSE.
  2. Notice to Microsoft. You agree to notify Microsoft in writing of any facts or circumstances of which
    You later become aware that would make Your representations in this Agreement inaccurate in any
    respect.
  3. Information about Submissions. You agree that contributions to Projects and information about
    contributions may be maintained indefinitely and disclosed publicly, including Your name and other
    information that You submit with Your Submission.
  4. Governing Law/Jurisdiction. This Agreement is governed by the laws of the State of Washington, and
    the parties consent to exclusive jurisdiction and venue in the federal courts sitting in King County,
    Washington, unless no federal subject matter jurisdiction exists, in which case the parties consent to
    exclusive jurisdiction and venue in the Superior Court of King County, Washington. The parties waive all
    defenses of lack of personal jurisdiction and forum non-conveniens.
  5. Entire Agreement/Assignment. This Agreement is the entire agreement between the parties, and
    supersedes any and all prior agreements, understandings or communications, written or oral, between
    the parties relating to the subject matter hereof. This Agreement may be assigned by Microsoft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant