DOC-1581 inference provider production path by djwfyi · Pull Request #2348 · loft-sh/vcluster-docs

djwfyi · 2026-06-30T19:15:10Z

Content Description

Adds the Inference Provider production path for teams building managed model-serving endpoints on GPU infrastructure. Includes a new production guide, a model-serving runtimes integration page, GPU and inference autoscaling patterns (KEDA and Prometheus), a GPU Operator install guide inside tenant clusters, and a readability pass on the new content.

Preview Link

Internal Reference

Partially addresses DOC-1581

AI review: mention @claude in a comment to request a review or changes. See CONTRIBUTING.md for available commands.

FORK LIMITATION: @claude does not work on pull requests opened from forks. GitHub Actions cannot access the required secrets for fork-originated PRs. To use AI review, push your branch directly to this repository.

@netlify /docs

Fix awkward phrasing, informal terms, and unclear word choices across inference-provider.mdx, model-serving.mdx, and gpu-hpa-dcgm.mdx. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

netlify · 2026-06-30T19:15:14Z

✅ Deploy Preview for vcluster-docs-site ready!

Name	Link
🔨 Latest commit	`3b515b8`
🔍 Latest deploy log	https://app.netlify.com/projects/vcluster-docs-site/deploys/6a44314d3d3d2c00078cf68e
😎 Deploy Preview	https://deploy-preview-2348--vcluster-docs-site.netlify.app/docs
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

github-actions · 2026-06-30T19:15:53Z

DOC-1581: Docs: make Inference Provider a first-class cross-product docs journey

djwfyi · 2026-06-30T19:16:48Z

Remaining work and known gaps from DOC-1581

This PR covers Phases 1–3 of the issue. The items below are either out of scope for this PR or deferred.

Done in this PR

inference-provider.mdx — new production guide (Day 0/1/2)
model-serving.mdx — tenant-runtime pattern, smoke-test deployment, model storage, Gateway API endpoint exposure
gpu-hpa-dcgm.mdx — serving-metrics section: KEDA, queue depth, request concurrency, TTFT, latency
gpu-and-accelerator-support.mdx — GPU Operator install guide inside tenant clusters
Cross-links from overview.mdx, ai-cloud.mdx, gpu-cloud-platform.mdx
Architecture diagram (inference-provider.svg)

Not done — cross-repo (separate PRs needed)

vmetal-docs Phase 4: inference-specific GPU capacity design in docs/operate/gpu-fleet.mdx (hardware SKUs for serving tiers, per-tenant node types, warm pool guidance, OS image/driver rollout for model-serving fleets) and docs/overview.mdx (add inference provider path)
vnode-docs Phase 5: inference isolation section in docs/concepts/tenancy-models.mdx and docs/concepts/vcluster-integration.mdx (when vNode matters for inference: untrusted adapters, custom containers, privileged workloads)
Cross-links from vmetal-docs and vnode-docs back to the inference provider path in vcluster-docs

Not done — vcluster-docs (future issues)

Security/trust model page (Phase 6): dedicated page or section with buyer-facing validation checklist — what an enterprise buyer can verify about isolation boundaries, RBAC, quota enforcement, and audit trail
Detailed Day 2 operations content: the Day 2 table links out but there is no prose for capacity exhaustion handling, endpoint drain and rollout, stuck model pods, GPU burn-in/validation, node reclaim on customer offboarding, or regional failover
SEO pass (Phase 7): natural placement of inference-provider search terms and final cross-link sweep across the three repos
End-to-end "serve a model" walkthrough: model-serving.mdx covers the runtime pattern but not the full step-by-step from vMetal node prep through validated external endpoint; the issue scoped this as a potential child issue

djwfyi · 2026-06-30T21:01:53Z

Reciprocal dependency: vmetal-docs#29 (DOC-1581 Phase 4)

The vMetal half of this inference provider journey is now up in loft-sh/vmetal-docs#29 (DOC-1581 Phase 4). It adds an "Inference provider capacity" section to the GPU fleet ops page and cross-links the vMetal docs back to the inference provider production path created here.

Merge order: vmetal-docs#29 links to https://www.vcluster.com/docs/vcluster/production-guide/inference-provider, which doesn't exist until this PR merges and deploys. The links are external URLs so no build gate fails either way, but please merge #2348 first (or together) so the vMetal links don't point at a not-yet-published page.

This PR already links into the vMetal pages (operate/gpu-fleet, deploy/gpu-quickstart), which are live, so there's no dependency in this direction.

Retarget the Day 0 and Day 2 vMetal cross-links to the new #inference-provider-capacity anchor added in vmetal-docs#29, and tie the endpoint readiness warm-pool guidance to vMetal's hardware-layer warm pool against on-demand provisioning. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

djwfyi and others added 4 commits June 30, 2026 13:16

DOC-1581 add inference provider docs

9442c28

DOC-1581 refine inference provider guide

046dd06

docs: clarify vBilling maturity

1db0e6a

docs: readability pass on inference provider docs

6deecd1

Fix awkward phrasing, informal terms, and unclear word choices across inference-provider.mdx, model-serving.mdx, and gpu-hpa-dcgm.mdx. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

djwfyi requested a review from a team as a code owner June 30, 2026 19:15

djwfyi self-assigned this Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DOC-1581 inference provider production path#2348

DOC-1581 inference provider production path#2348
djwfyi wants to merge 5 commits into
mainfrom
DOC-1581/inference-production-path

djwfyi commented Jun 30, 2026 •

edited

Loading

Uh oh!

netlify Bot commented Jun 30, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

djwfyi commented Jun 30, 2026

Uh oh!

djwfyi commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

djwfyi commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Content Description

Preview Link

Internal Reference

Uh oh!

netlify Bot commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for vcluster-docs-site ready!

Uh oh!

github-actions Bot commented Jun 30, 2026

Uh oh!

djwfyi commented Jun 30, 2026

Remaining work and known gaps from DOC-1581

Done in this PR

Not done — cross-repo (separate PRs needed)

Not done — vcluster-docs (future issues)

Uh oh!

djwfyi commented Jun 30, 2026

Reciprocal dependency: vmetal-docs#29 (DOC-1581 Phase 4)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

djwfyi commented Jun 30, 2026 •

edited

Loading

netlify Bot commented Jun 30, 2026 •

edited

Loading