Skip to content

feat: Benchmark Dataset Catalog (M120)#264

Merged
hlin99 merged 1 commit into
mainfrom
feat/m120-benchmark-catalog
Apr 6, 2026
Merged

feat: Benchmark Dataset Catalog (M120)#264
hlin99 merged 1 commit into
mainfrom
feat/m120-benchmark-catalog

Conversation

@hlin99

@hlin99 hlin99 commented Apr 6, 2026

Copy link
Copy Markdown
Member

Summary

Add a SQLite-backed local catalog for indexing and searching benchmark files.

Changes

  • DatasetCatalog class in catalog.py with SQLite storage
  • CatalogEntry, CatalogQuery, CatalogReport Pydantic models
  • SHA-256 file hash for duplicate detection
  • Metadata extraction: GPU type, model name, P:D ratio, QPS, request count, instances
  • Search/filter by GPU type, QPS range, P:D ratio, model name, instance count range
  • CLI catalog subcommand: add, list, search, show, remove
  • Programmatic manage_catalog() API
  • 23 new tests
  • Updated docs/iterations/current.md

Closes #263

- DatasetCatalog class in catalog.py with SQLite-backed storage
- CatalogEntry, CatalogQuery, CatalogReport Pydantic models
- SHA-256 file hash for duplicate detection
- Metadata extraction: GPU type, model, P:D ratio, QPS, request count
- Search by GPU type, QPS range, P:D ratio, model name, instance count
- CLI catalog subcommand: add, list, search, show, remove
- Programmatic manage_catalog() API
- 23 new tests

Closes #263

@hlin99-Review-Bot hlin99-Review-Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved by hlin99-Review-Bot

Idea Value: Strong. A SQLite-backed benchmark catalog is a natural next step — makes benchmark data discoverable and queryable instead of scattered files. Good alignment with the project's benchmarking infrastructure.

Code Quality:

  • Clean SQLite schema with proper indexes for common query patterns
  • SHA-256 duplicate detection ✓
  • Comprehensive search with multiple filter dimensions
  • CLI well-structured with Rich table output + JSON format option
  • 23 tests covering add/remove/search/duplicates/edge cases ✓
  • docs/iterations/current.md updated ✓
  • CI all green (lint + tests on 3.10/3.11/3.12) ✓

LGTM 🚀

@hlin99-Review-BotX hlin99-Review-BotX left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved by hlin99-Review-BotX

Idea Value: Good fit. A local SQLite catalog for benchmark files is a natural extension — makes discovery and filtering straightforward without external dependencies.

Code Quality:

  • Clean SQLite schema with proper indexes for gpu_type, pd_ratio, qps, date ✓
  • SHA-256 duplicate detection ✓
  • Pydantic models well-structured ✓
  • Search supports multiple filter dimensions with proper parameterized queries ✓
  • CLI with Rich table + JSON output ✓
  • 23 tests ✓
  • docs/iterations/current.md updated ✓
  • CI all green (lint + tests 3.10/3.11/3.12) ✓

LGTM 🚀

@hlin99 hlin99 merged commit fafc1f7 into main Apr 6, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Benchmark Dataset Catalog (M120)

3 participants