Skip to content

Commit 93d580f

Browse files
docs: add synthetic benchmarking plan for library scanning performance
- Create task-164 for implementing benchmark suite with targets: - Initial import ~41k tracks: <5 min (stretch <60s) - No-op rescan: <10s - Incremental rescan: proportional to changes - Update task-012 to depend on task-164 (benchmark before optimize) - Add comprehensive docs/benchmark.md covering: - Current architecture analysis and bottlenecks - Required 2-phase scanning with fingerprint storage - Synthetic dataset strategies (shape-only, clone-based, pathological) - Benchmark scenarios and tooling design - Taskfile integration plan - Safety guarantees and result interpretation
1 parent ed98805 commit 93d580f

3 files changed

Lines changed: 626 additions & 4 deletions

File tree

backlog/tasks/task-012 - Implement-performance-optimizations.md

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,24 @@ title: Implement performance optimizations
44
status: In Progress
55
assignee: []
66
created_date: '2025-09-17 04:10'
7-
updated_date: '2026-01-16 22:22'
7+
updated_date: '2026-01-17 10:30'
88
labels: []
9-
dependencies: []
10-
ordinal: 27500
9+
dependencies:
10+
- task-164
11+
ordinal: 12250
1112
---
1213

1314
## Description
1415

1516
<!-- SECTION:DESCRIPTION:BEGIN -->
16-
Optimize directory traversal, database operations, and network caching for better performance
17+
Optimize directory traversal, database operations, and network caching for better performance.
18+
19+
**IMPORTANT**: Before implementing optimizations, complete task-164 (synthetic benchmarking) to establish baselines and validate that proposed changes actually improve performance. Premature optimization without measurement is risky for a 267GB / 41k track library.
20+
21+
Performance targets (from benchmarking):
22+
- Initial import of ~41k tracks: < 5 minutes (stretch: < 60s)
23+
- No-op rescan (unchanged library): < 10s
24+
- Incremental rescan (1% delta): proportional to changes
1725
<!-- SECTION:DESCRIPTION:END -->
1826

1927
## Acceptance Criteria
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
---
2+
id: task-164
3+
title: Implement synthetic library benchmarking for scan performance
4+
status: In Progress
5+
assignee: []
6+
created_date: '2026-01-17 10:29'
7+
updated_date: '2026-01-17 10:32'
8+
labels:
9+
- performance
10+
- testing
11+
- scanning
12+
dependencies: []
13+
priority: high
14+
ordinal: 6125
15+
---
16+
17+
## Description
18+
19+
<!-- SECTION:DESCRIPTION:BEGIN -->
20+
Create a benchmarking suite to measure and validate library scanning performance before optimizing. The benchmark must prove the architecture can meet targets:
21+
- Initial import of ~41k tracks: < 5 minutes (stretch: < 60s)
22+
- No-op rescan (unchanged library): < 10s
23+
- Incremental rescan (1% delta): proportional to changes
24+
25+
Key architectural requirement: scanning must use a 2-phase approach:
26+
1. Phase 1 (inventory): walk + stat (mtime_ns, size) + DB diff — no tag parsing
27+
2. Phase 2 (parse delta): mutagen only for added/changed files
28+
29+
This requires storing fingerprints (file_mtime_ns, file_size) in the library table.
30+
31+
Benchmarking approach:
32+
- Dataset A (shape-only): 41k tiny files for traversal/DB stress testing
33+
- Dataset B (clone-based): APFS clones of ~400 real seed files to 41k paths for realistic mutagen timing
34+
- Dataset C (pathological): edge cases (2k+ files in one dir, deep nesting, corrupt files, unicode)
35+
36+
Scenarios to benchmark:
37+
1. Initial import (fresh DB)
38+
2. No-op rescan (same DB, no changes)
39+
3. Delta rescan (add 200, touch 200, delete 10)
40+
41+
See docs/benchmark.md for comprehensive planning details.
42+
<!-- SECTION:DESCRIPTION:END -->
43+
44+
## Acceptance Criteria
45+
<!-- AC:BEGIN -->
46+
- [ ] #1 tests/bench/ directory created with benchmark scripts
47+
- [ ] #2 make_synth_library.py generates Dataset A (shape) and Dataset B (clone) libraries
48+
- [ ] #3 bench_scan.py measures walk, stat, DB diff, parse, and DB write phases separately
49+
- [ ] #4 Taskfile tasks added: bench:make:shape, bench:make:clone, bench:scan:initial, bench:scan:noop, bench:scan:delta, bench:scan:full
50+
- [ ] #5 All benchmarks use isolated DB path (/tmp/mt-bench/mt.db) - never touch production DB
51+
- [ ] #6 Benchmark outputs JSON + human-readable metrics: times, counts, throughput, peak RSS
52+
- [ ] #7 library table schema updated with file_mtime_ns column for fingerprint storage
53+
- [ ] #8 Optional: bench:zig:walk task to compare Zig traversal ceiling vs Python
54+
<!-- AC:END -->

0 commit comments

Comments
 (0)