Skip to content

feat: Workload Mix Optimizer (M115)#254

Merged
hlin99 merged 1 commit into
mainfrom
feat/m115-workload-mix-optimizer
Apr 6, 2026
Merged

feat: Workload Mix Optimizer (M115)#254
hlin99 merged 1 commit into
mainfrom
feat/m115-workload-mix-optimizer

Conversation

@hlin99

@hlin99 hlin99 commented Apr 6, 2026

Copy link
Copy Markdown
Member

Summary

Add Workload Mix Optimizer (M115) — given benchmark data for multiple workloads (different models or request patterns), find the minimum total GPU instances while meeting per-workload SLA constraints.

Changes

  • WorkloadMixOptimizer class in workload_mix.py with brute-force enumeration and budget pruning
  • WorkloadSpec, WorkloadAllocation, MixOptimizationResult Pydantic models
  • SLA compliance check using measured percentiles from benchmark data
  • CLI workload-mix subcommand with --workload (repeatable), --total-gpus, Rich table + JSON output
  • Programmatic optimize_workload_mix() API
  • 32 new tests covering models, optimizer, convenience API, public imports
  • Updated ROADMAP.md (M114 ✅, M115 🔄) and docs/iterations/current.md

Closes #253

- WorkloadMixOptimizer class in workload_mix.py
- WorkloadSpec, WorkloadAllocation, MixOptimizationResult Pydantic models
- Brute-force enumeration with budget pruning
- SLA compliance check using measured percentiles
- CLI workload-mix subcommand with --workload, --total-gpus, table + JSON output
- Programmatic optimize_workload_mix() API
- 32 new tests

Closes #253

@hlin99-Review-Bot hlin99-Review-Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved by hlin99-Review-Bot

Idea Value: Strong addition — workload mix optimization is a natural next step after multi-backend comparison. Aligns well with the project's data-driven GPU planning goals.

Code Quality:

  • Clean WorkloadMixOptimizer with brute-force + budget pruning — appropriate for the problem size
  • Pydantic models well-defined with proper validators (WorkloadSpec, WorkloadAllocation, MixOptimizationResult)
  • SLA check uses measured percentiles from benchmark data (no guessing ✓)
  • CLI follows established patterns with Rich table + JSON output
  • 32 tests covering models, optimizer, convenience API, and public imports
  • docs/iterations/current.md updated ✓
  • ROADMAP.md updated ✓

CI: All checks pass (lint, test 3.10/3.11/3.12)

LGTM 👍

@hlin99-Review-BotX hlin99-Review-BotX left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved by hlin99-Review-BotX

Idea Value: Solid — workload mix optimization is the logical extension for multi-workload GPU cluster planning. Aligns with project goals.

Code Quality:

  • WorkloadMixOptimizer brute-force + pruning approach is clean and appropriate
  • Pydantic models with proper validators (WorkloadSpec.name strip, weight > 0)
  • _check_sla uses measured percentiles — data-driven ✓
  • _is_better comparison logic is correct (SLA-met first, then fewer instances, then lower waste)
  • CLI follows repo patterns (Rich table + JSON, argparse registration)
  • 32 tests with good coverage across models, optimizer, convenience API, imports
  • docs/iterations/current.md and ROADMAP.md updated ✓

CI: All checks pass (lint, test 3.10/3.11/3.12)

LGTM 👍 — 2nd approval, should auto-merge.

@hlin99 hlin99 merged commit 4603410 into main Apr 6, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Workload Mix Optimizer (M115)

3 participants