Problem
Users need to understand how prompt size affects inference performance to optimize their workloads.
Solution
Add xpyd-bench size-impact CLI subcommand that sweeps across prompt sizes and measures performance at each level.
Acceptance Criteria
xpyd-bench size-impact --base-url <url> --model <model> CLI subcommand
- Sweep across prompt sizes (default: 10, 100, 500, 1000, 2000, 4000 tokens)
- Measure TTFT, TPOT, throughput at each prompt size level
- Detect linear vs sublinear vs superlinear scaling behavior
- Report: size-latency curve, inflection points, recommended max prompt size for target latency
- JSON output with per-size-level results
--size-levels to customize sweep sizes
- Tests covering sweep orchestration, scaling detection, and CLI integration
Problem
Users need to understand how prompt size affects inference performance to optimize their workloads.
Solution
Add
xpyd-bench size-impactCLI subcommand that sweeps across prompt sizes and measures performance at each level.Acceptance Criteria
xpyd-bench size-impact --base-url <url> --model <model>CLI subcommand--size-levelsto customize sweep sizes