Skip to content

docs(readme): land ModelRouter prominently for the 0.7.8 release#464

Merged
Defilan merged 1 commit into
defilantech:mainfrom
Defilan:docs/readme-modelrouter-audit
May 13, 2026
Merged

docs(readme): land ModelRouter prominently for the 0.7.8 release#464
Defilan merged 1 commit into
defilantech:mainfrom
Defilan:docs/readme-modelrouter-audit

Conversation

@Defilan
Copy link
Copy Markdown
Member

@Defilan Defilan commented May 13, 2026

What

README audit that brings the ModelRouter CRD, fail-closed semantics, and per-rule timeout budgets to the front of the page. Right now the README doesn't mention any of the major capabilities shipping in v0.7.8 (about to release via PR #443), so a viewer arriving from external coverage sees an outdated story.

Why

Marcel Dempers ("That DevOps Guy", 91.2K YouTube subscribers) published a video titled "Run LLMs on Kubernetes with LLMKube" on 2026-05-13 at 12:07 UTC. GitHub traffic data attributes today's six new stars to direct viewer follow-through (all six landed in the hours after the video published). Naveen (Kubernetes with Naveen, 10.8K X followers) amplified the video at 17:31 UTC.

A first-time viewer landing on the README needs to see what `ModelRouter` is and why `InferenceService` is now only half the story. Today's README doesn't surface either.

No issue link — this is README polish, not a bug or a feature.

How

Six surgical edits:

  • 0.7.8 callout below The Problem so the viewer's helm-install version matches the README narrative.
  • New "Composition: ModelRouter" section between Metal Agent and How Is This Different. Includes a worked example (strict-PII / complex-fallback / cloud-credential) drawn from `hack/demo-modelrouter.sh`, plus the three properties the audience cares about (fail-closed, per-rule budgets, OpenAI-compatible streaming).
  • Comparison table grows three rows: hybrid local+cloud routing with policy, fail-closed for regulated data, per-rule timeout budgets. Columns where LLMKube is alone vs every peer.
  • "Versus newer adjacent projects" prose gains a LiteLLM entry explaining ModelRouter composes with LiteLLM rather than replacing it.
  • Features list gains a "Routing & policy" block parallel to Inference / GPU / Operations.
  • TOC swaps Architecture for ModelRouter (Architecture mermaid is already inline).

Nothing removed. The Problem statement gains one sentence about mixed local+cloud routing being its own platform problem.

All three internal links verified locally:

  • `docs/site/concepts/model-router.md`
  • `config/samples/inference_v1alpha1_modelrouter.yaml`
  • `deployment/macos/README.md` (unchanged)

Checklist

  • Tests added/updated — N/A, docs only
  • `make test` passes locally — N/A, docs only
  • `make lint` passes locally — N/A, docs only
  • Commit messages follow conventional commits
  • All commits are signed off (`git commit -s`) per DCO
  • Documentation updated — this IS the doc update

Suggested merge order

Merge PR #443 (release-please `chore: release 0.7.8`) first so the chart version matches the README's 0.7.8 references. Then merge this PR. Both can land within a few minutes of each other.

…ffic wave

That DevOps Guy (91K subs) published a video featuring LLMKube on
2026-05-13, and a measurable star bump tracks the publish time. The
README didn't mention ModelRouter, fail-closed semantics, or per-rule
budgets at all, so any viewer landing on the repo was missing the
headline new capability that ships in v0.7.8 (about to release).

Updates:

- New "0.7.8 just shipped" callout below The Problem so the version
  number a viewer sees matches the helm chart they're about to
  install.
- New top-level "Composition: ModelRouter" section between The
  Metal Agent and How Is This Different. Includes a worked example
  (strict pii rule + complex fallback + cloud-tier credentialsRef)
  drawn from the demo we use day-to-day. Three pull-out properties
  the audience cares about (fail-closed, per-rule timeouts,
  OpenAI-compatible streaming) called out in plain language.
- Comparison table grows three rows: hybrid local+cloud routing,
  fail-closed for regulated data, per-rule timeout budgets. These
  are the columns where LLMKube is alone vs vLLM / Ollama / KServe /
  LocalAI.
- "Versus newer adjacent projects" prose gains a LiteLLM entry
  explaining that ModelRouter composes with LiteLLM rather than
  replacing it.
- Features list grows a "Routing & policy" block parallel to the
  existing Inference / GPU / Operations blocks.
- TOC swaps "Architecture" for "ModelRouter" since the
  architecture mermaid is already inline.

Nothing removed. The Problem statement gains one sentence about
mixed local + cloud routing being its own platform problem.

Signed-off-by: Christopher Maher <chris@mahercode.io>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Defilan Defilan merged commit deb24bb into defilantech:main May 13, 2026
12 of 14 checks passed
@github-actions github-actions Bot mentioned this pull request May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant