test: 36 tests for lib/eval/entity_coverage.py by hai-pilgrim · Pull Request #21 · heiervang-technologies/supercompact

hai-pilgrim · 2026-03-29T03:29:25Z

Summary

Adds tests/test_eval_entity_coverage.py with 36 pytest tests for lib/eval/entity_coverage.py
Covers EntitySet dataclass: total_count property, all_entities() method, default empty dict
Tests ENTITY_TYPES constant: required keys present, all weights positive, file_path/error at highest weight
Tests extract_entities for: exceptions (ValueError, ModuleNotFoundError), HTTPS/HTTP URLs, port numbers (colon and keyword forms, range filtering), absolute file paths, CamelCase class names, pip/npm packages, HTTP status codes (404, 500)
Tests compute_coverage: empty-suffix → (1.0, 1.0, {}), empty-kept → 0.0, identical sets → 1.0, breakdown structure, half-covered, type-mismatch → 0.0, weighted vs unweighted divergence by type importance

Test plan

All 36 tests pass via uv run pytest tests/test_eval_entity_coverage.py
Pure-logic tests — regex extraction and set arithmetic only

🤖 Opened by hai-pilgrim as part of the Pilgrim wandering-agent contribution run.

…ompaction - install.sh now auto-configures ~/.claude/settings.json (creates pluginDirs entry, idempotent across create/add/already-exists cases) - uninstall.sh now cleans up the settings.json pluginDirs entry - Add compact-session.sh: self-contained script that finds JSONL, runs supercompact, backs up original, replaces, and reports results - Simplify /supercompact command from 5-step multi-bash prompt to single script call with CLAUDE_PLUGIN_ROOT fallback to hardcoded install path - Simplify PreCompact hook to backup-only (removes wasted supercompact run that Claude's LLM compaction immediately overwrites) - Update README: accurate hook description, file tree with compact-session.sh, update/upgrade docs, standalone binary limitations clearly stated Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

export_json: file creation, empty array, result structure (method, budget, model_key, composite, ndcg), speed/token counts, dimension scores with score and probe_count, multiple results, valid JSON. export_trace: file creation, path location, filename contains method/budget, JSON has method/budget, empty answers → empty entries, matching probe included, unmatched answer skipped, auto-creates trace dir.

Tests cover ProbeAnswer/JudgeResult dataclass defaults and field storage, ANSWER_MODELS/JUDGE_MODEL constants, generate_answers with empty probe set, score_answers missing-probe path (no API call), missing OPENROUTER_API_KEY error, and _score_one_answer JSON parsing, markdown fence stripping, score clamping, and bad-JSON fallback.

Tests cover DIFFICULTY_WEIGHTS constant, DimensionScore/AggregateResult dataclass defaults and fields, dimension_map property, _dcg (empty, single, sorting by weight, zero scores, position discounting), and aggregate() (empty answers, single/multiple models, score 0-1 normalisation, missing probes skipped, empty-dimension zero-mean, perfect/zero/partial NDCG).

Tests cover DIFFICULTY_WEIGHTS constant, ProbeCoverage/DimensionCoverage/ EvidenceCoverageResult dataclasses, dimension_map property, to_dict keys, _dcg (empty, single, zero-score, weight-sorted), and compute_evidence_coverage (empty probe set, probe with no evidence_turns skipped, full coverage → 1.0, zero coverage → 0.0, partial coverage value, kept/dropped lists, multi-probe mean, NDCG perfect/zero/partial).

Tests cover EntitySet (total_count, all_entities, default dict), ENTITY_TYPES constants (presence, positive weights), extract_entities for exceptions, URLs, ports (with range filtering), file paths, CamelCase class names, pip/npm packages, and HTTP status codes. Also covers compute_coverage (empty-suffix → 1.0, empty-kept → 0.0, identical sets → 1.0, breakdown structure, half coverage, type mismatch → 0.0, weighted vs unweighted divergence).

marksverdhei and others added 6 commits February 13, 2026 13:59

marksverdhei force-pushed the main branch from bb4fb00 to 2a4c770 Compare April 1, 2026 09:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: 36 tests for lib/eval/entity_coverage.py#21

test: 36 tests for lib/eval/entity_coverage.py#21
hai-pilgrim wants to merge 6 commits into
heiervang-technologies:mainfrom
hai-pilgrim:test/eval-entity-coverage

hai-pilgrim commented Mar 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hai-pilgrim commented Mar 29, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants