Add benchmark diagnostics metrics#72
Conversation
Code reviewA few concrete issues worth addressing before merge. Bugs1.
timings["slowest_component_ms"] = round(max(timings.get(key, 0.0) for key in ELAPSED_KEYS), 3)
...
def slowest_component(self) -> str:
return max(ELAPSED_KEYS, key=lambda key: self.elapsed_ms.get(key, 0.0))Consequence: 2. timings["critical_path"] = round(max((timings.get(key, 0.0) for key in critical_keys), default=0.0), 3)In a sequential pipeline the critical path duration is the sum of the dependent stages, not the largest single stage. As written this is just "slowest of these seven components"; either rename it (e.g. 3. Same exception recorded multiple times in When something inside
4. linking_results = linking_metrics.compute(predictions, records)
results = {
...
"linking": linking_results,
"rxnorm": linking_results,
...
}Both keys point to the exact same dict — every consumer now has to decide which to use, the schema makes both required with identical fields, and 5. def build_manifest(
*,
...
sample_size: int,
output_prefix: str,
concurrency: int | None = None,
results: dict,
random_seed: int | str | None = None,
...Python permits defaulted keyword-only args before required ones, but interleaving them like this ( Minor
Generated by Claude Code |
Code reviewReviewed the diff against Bug:
|
Summary
Test Plan