Skip to content

feat(M100): Request Payload Size Impact Analysis #263

@hlin99

Description

@hlin99

Problem

Users need to understand how prompt size affects inference performance to optimize their workloads.

Solution

Add xpyd-bench size-impact CLI subcommand that sweeps across prompt sizes and measures performance at each level.

Acceptance Criteria

  • xpyd-bench size-impact --base-url <url> --model <model> CLI subcommand
  • Sweep across prompt sizes (default: 10, 100, 500, 1000, 2000, 4000 tokens)
  • Measure TTFT, TPOT, throughput at each prompt size level
  • Detect linear vs sublinear vs superlinear scaling behavior
  • Report: size-latency curve, inflection points, recommended max prompt size for target latency
  • JSON output with per-size-level results
  • --size-levels to customize sweep sizes
  • Tests covering sweep orchestration, scaling detection, and CLI integration

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions