docs(readme): land ModelRouter prominently for the 0.7.8 release by Defilan · Pull Request #464 · defilantech/LLMKube

Defilan · 2026-05-13T18:51:28Z

What

README audit that brings the ModelRouter CRD, fail-closed semantics, and per-rule timeout budgets to the front of the page. Right now the README doesn't mention any of the major capabilities shipping in v0.7.8 (about to release via PR #443), so a viewer arriving from external coverage sees an outdated story.

Why

Marcel Dempers ("That DevOps Guy", 91.2K YouTube subscribers) published a video titled "Run LLMs on Kubernetes with LLMKube" on 2026-05-13 at 12:07 UTC. GitHub traffic data attributes today's six new stars to direct viewer follow-through (all six landed in the hours after the video published). Naveen (Kubernetes with Naveen, 10.8K X followers) amplified the video at 17:31 UTC.

A first-time viewer landing on the README needs to see what `ModelRouter` is and why `InferenceService` is now only half the story. Today's README doesn't surface either.

No issue link — this is README polish, not a bug or a feature.

How

Six surgical edits:

0.7.8 callout below The Problem so the viewer's helm-install version matches the README narrative.
New "Composition: ModelRouter" section between Metal Agent and How Is This Different. Includes a worked example (strict-PII / complex-fallback / cloud-credential) drawn from `hack/demo-modelrouter.sh`, plus the three properties the audience cares about (fail-closed, per-rule budgets, OpenAI-compatible streaming).
Comparison table grows three rows: hybrid local+cloud routing with policy, fail-closed for regulated data, per-rule timeout budgets. Columns where LLMKube is alone vs every peer.
"Versus newer adjacent projects" prose gains a LiteLLM entry explaining ModelRouter composes with LiteLLM rather than replacing it.
Features list gains a "Routing & policy" block parallel to Inference / GPU / Operations.
TOC swaps Architecture for ModelRouter (Architecture mermaid is already inline).

Nothing removed. The Problem statement gains one sentence about mixed local+cloud routing being its own platform problem.

All three internal links verified locally:

`docs/site/concepts/model-router.md`
`config/samples/inference_v1alpha1_modelrouter.yaml`
`deployment/macos/README.md` (unchanged)

Checklist

Tests added/updated — N/A, docs only
`make test` passes locally — N/A, docs only
`make lint` passes locally — N/A, docs only
Commit messages follow conventional commits
All commits are signed off (`git commit -s`) per DCO
Documentation updated — this IS the doc update

Suggested merge order

Merge PR #443 (release-please `chore: release 0.7.8`) first so the chart version matches the README's 0.7.8 references. Then merge this PR. Both can land within a few minutes of each other.

…ffic wave That DevOps Guy (91K subs) published a video featuring LLMKube on 2026-05-13, and a measurable star bump tracks the publish time. The README didn't mention ModelRouter, fail-closed semantics, or per-rule budgets at all, so any viewer landing on the repo was missing the headline new capability that ships in v0.7.8 (about to release). Updates: - New "0.7.8 just shipped" callout below The Problem so the version number a viewer sees matches the helm chart they're about to install. - New top-level "Composition: ModelRouter" section between The Metal Agent and How Is This Different. Includes a worked example (strict pii rule + complex fallback + cloud-tier credentialsRef) drawn from the demo we use day-to-day. Three pull-out properties the audience cares about (fail-closed, per-rule timeouts, OpenAI-compatible streaming) called out in plain language. - Comparison table grows three rows: hybrid local+cloud routing, fail-closed for regulated data, per-rule timeout budgets. These are the columns where LLMKube is alone vs vLLM / Ollama / KServe / LocalAI. - "Versus newer adjacent projects" prose gains a LiteLLM entry explaining that ModelRouter composes with LiteLLM rather than replacing it. - Features list grows a "Routing & policy" block parallel to the existing Inference / GPU / Operations blocks. - TOC swaps "Architecture" for "ModelRouter" since the architecture mermaid is already inline. Nothing removed. The Problem statement gains one sentence about mixed local + cloud routing being its own platform problem. Signed-off-by: Christopher Maher <chris@mahercode.io>

codecov · 2026-05-13T18:55:27Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Defilan mentioned this pull request May 13, 2026

fix(e2e): unblock MicroShift SCC diagnostics + bump bootstrap timeout #466

Merged

6 tasks

Defilan merged commit deb24bb into defilantech:main May 13, 2026
12 of 14 checks passed

github-actions Bot mentioned this pull request May 13, 2026

chore: release 0.7.8 #443

Merged

Defilan mentioned this pull request May 14, 2026

fix: preserve external annotations on reconciler Deployment updates #468

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(readme): land ModelRouter prominently for the 0.7.8 release#464

docs(readme): land ModelRouter prominently for the 0.7.8 release#464
Defilan merged 1 commit into
defilantech:mainfrom
Defilan:docs/readme-modelrouter-audit

Defilan commented May 13, 2026

Uh oh!

codecov Bot commented May 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Defilan commented May 13, 2026

What

Why

How

Checklist

Suggested merge order

Uh oh!

codecov Bot commented May 13, 2026

Codecov Report

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant