Skip to content

test: 36 tests for lib/eval/entity_coverage.py#21

Open
hai-pilgrim wants to merge 6 commits into
heiervang-technologies:mainfrom
hai-pilgrim:test/eval-entity-coverage
Open

test: 36 tests for lib/eval/entity_coverage.py#21
hai-pilgrim wants to merge 6 commits into
heiervang-technologies:mainfrom
hai-pilgrim:test/eval-entity-coverage

Conversation

@hai-pilgrim
Copy link
Copy Markdown

Summary

  • Adds tests/test_eval_entity_coverage.py with 36 pytest tests for lib/eval/entity_coverage.py
  • Covers EntitySet dataclass: total_count property, all_entities() method, default empty dict
  • Tests ENTITY_TYPES constant: required keys present, all weights positive, file_path/error at highest weight
  • Tests extract_entities for: exceptions (ValueError, ModuleNotFoundError), HTTPS/HTTP URLs, port numbers (colon and keyword forms, range filtering), absolute file paths, CamelCase class names, pip/npm packages, HTTP status codes (404, 500)
  • Tests compute_coverage: empty-suffix → (1.0, 1.0, {}), empty-kept → 0.0, identical sets → 1.0, breakdown structure, half-covered, type-mismatch → 0.0, weighted vs unweighted divergence by type importance

Test plan

  • All 36 tests pass via uv run pytest tests/test_eval_entity_coverage.py
  • Pure-logic tests — regex extraction and set arithmetic only

🤖 Opened by hai-pilgrim as part of the Pilgrim wandering-agent contribution run.

marksverdhei and others added 6 commits February 13, 2026 13:59
…ompaction

- install.sh now auto-configures ~/.claude/settings.json (creates pluginDirs
  entry, idempotent across create/add/already-exists cases)
- uninstall.sh now cleans up the settings.json pluginDirs entry
- Add compact-session.sh: self-contained script that finds JSONL, runs
  supercompact, backs up original, replaces, and reports results
- Simplify /supercompact command from 5-step multi-bash prompt to single
  script call with CLAUDE_PLUGIN_ROOT fallback to hardcoded install path
- Simplify PreCompact hook to backup-only (removes wasted supercompact run
  that Claude's LLM compaction immediately overwrites)
- Update README: accurate hook description, file tree with compact-session.sh,
  update/upgrade docs, standalone binary limitations clearly stated

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
export_json: file creation, empty array, result structure (method, budget,
model_key, composite, ndcg), speed/token counts, dimension scores with score
and probe_count, multiple results, valid JSON.
export_trace: file creation, path location, filename contains method/budget,
JSON has method/budget, empty answers → empty entries, matching probe included,
unmatched answer skipped, auto-creates trace dir.
Tests cover ProbeAnswer/JudgeResult dataclass defaults and field storage,
ANSWER_MODELS/JUDGE_MODEL constants, generate_answers with empty probe set,
score_answers missing-probe path (no API call), missing OPENROUTER_API_KEY error,
and _score_one_answer JSON parsing, markdown fence stripping, score clamping,
and bad-JSON fallback.
Tests cover DIFFICULTY_WEIGHTS constant, DimensionScore/AggregateResult dataclass
defaults and fields, dimension_map property, _dcg (empty, single, sorting by weight,
zero scores, position discounting), and aggregate() (empty answers, single/multiple
models, score 0-1 normalisation, missing probes skipped, empty-dimension zero-mean,
perfect/zero/partial NDCG).
Tests cover DIFFICULTY_WEIGHTS constant, ProbeCoverage/DimensionCoverage/
EvidenceCoverageResult dataclasses, dimension_map property, to_dict keys,
_dcg (empty, single, zero-score, weight-sorted), and compute_evidence_coverage
(empty probe set, probe with no evidence_turns skipped, full coverage → 1.0,
zero coverage → 0.0, partial coverage value, kept/dropped lists, multi-probe
mean, NDCG perfect/zero/partial).
Tests cover EntitySet (total_count, all_entities, default dict), ENTITY_TYPES
constants (presence, positive weights), extract_entities for exceptions, URLs,
ports (with range filtering), file paths, CamelCase class names, pip/npm
packages, and HTTP status codes. Also covers compute_coverage (empty-suffix
→ 1.0, empty-kept → 0.0, identical sets → 1.0, breakdown structure, half
coverage, type mismatch → 0.0, weighted vs unweighted divergence).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants