skills(model-serving): merge dev-side training/agent flows from a-d-k experimental by jamesbroadhead · Pull Request #84 · databricks/databricks-agent-skills

jamesbroadhead · 2026-05-24T20:52:22Z

Summary

Merges the dev-side surface of experimental:databricks-ml-training-serving/ into stable's databricks-model-serving. Closes #73's TODO #1b.

The two skills had near-zero content overlap — stable was ops-focused (manage existing endpoints via CLI); experimental was dev-focused (train, register, log a PyFunc or ResponsesAgent, deploy). Combining them avoids forcing users to invoke two skills for what is functionally one workflow.

Shape:

SKILL.md remains the ops entry-point (create / query / update / scale endpoints, App integration).
3 new references/ files carry the dev-side flow verbatim from experimental.
Frontmatter description broadened so agent routing fires on dev-side asks too (train, register, PyFunc, ResponsesAgent) — see description field for the full trigger phrase list. NOT for: no-code agents (use databricks-agent-bricks); MLflow scorers (use databricks-mlflow-evaluation).

Changes

Stable file	Source on experimental	Covers
`references/training-and-serving.md`	`databricks-ml-training-serving/SKILL.md`	Canonical train/register/serve flow, `mlflow.{sklearn,xgboost,…}.autolog()` patterns, UC alias-based promotion (`@prod`/`@challenger`), batch scoring via `spark_udf`, real-time endpoint create + zero-downtime version swap, `state.ready` vs `state.config_update` poll-both gotcha, `jobs submit --no-wait` serverless deploy pattern, Foundation Model API endpoints runtime-list (replaces the earlier static catalog draft per @QuentinAmbard's review), and the gotchas trap-table.
`references/custom-pyfunc.md`	`databricks-ml-training-serving/1-custom-pyfunc.md`	File-based PyFunc ("Models from Code" via `python_model="model.py"`), `infer_signature`, `code_paths`, pre-deploy validation via `mlflow.models.predict(env_manager="uv")`.
`references/genai-agents.md`	`databricks-ml-training-serving/2-genai-agents.md`	`ResponsesAgent` interface, LangGraph agent with `UCFunctionToolkit` + `VectorSearchRetrieverTool`, the `create_text_output_item` raw-dict-silently-fails gotcha, the `resources=[...]` passthrough-auth list, async deploy via `agents.deploy()` from a serverless job, query via CLI and OpenAI-compatible client.

All 3 ports are verbatim — only mechanical adjustments:

Strip the SKILL.md frontmatter on the SKILL→reference promotion.
Sibling-file renames: 1-custom-pyfunc.md → custom-pyfunc.md, 2-genai-agents.md → genai-agents.md.
Cross-skill paths bumped one level for the deeper references/ location. Stable peers (databricks-jobs) use ../../; experimental-only peers (databricks-agent-bricks, databricks-mlflow-evaluation, databricks-vector-search, databricks-unity-catalog) use ../../../experimental/.

SKILL.md updates (kept tight — ops focus preserved):

FM-endpoint discovery now points to references/training-and-serving.md#foundation-model-api-endpoints.
New ### Develop & deploy new models subsection under "What's Next" with a 3-row table linking the new references.
Frontmatter description expanded to cover the dev surface (see above).
Version bumped to 0.3.0.

Manifest: regenerated via python3 scripts/skills.py generate.

Reviewer history

Earlier commits on this branch made two mistakes that have since been corrected:

Initial draft sourced FM endpoint content from main:databricks-model-serving/ rather than experimental:databricks-ml-training-serving/ — caught by @QuentinAmbard, reworked.
The first rework still shipped a static FM endpoint catalog. The experimental skill deliberately rejects static catalogs in favour of databricks serving-endpoints list | jq ... plus runtime-resolved defaults. Replaced with the runtime-list snippet.

Coverage vs. #73 TODO #1b

TODO #1b item	Landed in this PR
classical-ml autolog patterns	`training-and-serving.md` § Train and register
Custom PyFunc signatures	`custom-pyfunc.md` (whole file)
`ResponsesAgent` + `create_text_output_item` gotcha	`genai-agents.md` § CRITICAL: output items must use helper methods
`UCFunctionToolkit` + `VectorSearchRetrieverTool` resource passthrough	`genai-agents.md` § Log + register + § Resources that need passthrough auth
Foundation Model API endpoint table → runtime-list	`training-and-serving.md` § Foundation Model API endpoints

Plus content the original TODO didn't enumerate: batch scoring via spark_udf, real-time endpoint create + version swap, the state.ready vs state.config_update poll-both gotcha, serverless jobs submit --no-wait deploy pattern, the consolidated gotchas trap-table.

Known follow-ups (out of scope)

references/training-and-serving.md has an anchor link #one-time-runs-jobs-submit--async-pattern-for-notebooks into databricks-jobs/SKILL.md. The section exists in a-d-k's databricks-jobs but not yet in d-a-s databricks-jobs/SKILL.md. Link falls back to the file top.
references/off-platform-streaming.md (pre-existing from #76) is in the manifest but unwired from SKILL.md. Untouched by this PR.
If #87 promotes databricks-vector-search to stable, the ../../../experimental/databricks-vector-search/SKILL.md link in training-and-serving.md should be flipped to the stable path. skills: promote databricks-vector-search to stable #87's link-sweep should handle it.

Test plan

python3 scripts/skills.py generate clean.
python3 scripts/skills.py validate passes (Everything is up to date.).
All 3 new reference files diff cleanly against upstream (only the mechanical adjustments above).
All 6 cross-skill links in the new reference files resolve to existing files in this repo.
CI green on this branch.
Owner review (@databricks/eng-apps-devex per CODEOWNERS).

This pull request and its description were written by Claude.

Phase 1 of #73's TODO #1b. Adds references/fm-api-endpoints.md with the curated Foundation Model API endpoint table (chat/instruct + embedding models) from databricks-solutions/ai-dev-kit's model-serving skill, plus common defaults and query examples (CLI + SDK). Stripped: the cloud/language prefix on the docs link, and the leftover MCP-tool references in the source. The endpoint table itself is static catalog data — no MCP coupling. SKILL.md updates: - bump version to 0.2.0 - point Endpoint Types table at the new reference - point the Foundation Model discovery bullet at the new reference Subsequent phases (separate PRs / commits) port the remaining dev-side content: classical-ml autolog patterns, Custom PyFunc signatures, ResponsesAgent with the create_text_output_item gotcha, UCFunctionToolkit + VectorSearchRetrieverTool resource passthrough. Co-authored-by: Isaac

Aligns the verbatim a-d-k port with the live docs.databricks.com supported-models page (validated via WebFetch on 2026-05-26): ADDED (missing from a-d-k snapshot): - databricks-claude-opus-4-7 (now most capable Claude) - databricks-gpt-5-5-pro, 5-5 - databricks-gpt-5-4, 5-4-mini, 5-4-nano - databricks-gpt-5-3-codex, 5-2-codex - databricks-gemini-3-1-flash-lite, 3-5-flash - databricks-qwen35-122b-a10b (Preview) REMOVED (retired, no longer in docs): - databricks-claude-3-7-sonnet - databricks-meta-llama-3-1-405b-instruct UPDATED notes: - claude-opus-4-6 no longer "Most capable" - gpt-5-2 no longer "Latest" - gpt-5-1-codex-{max,mini} + gpt-5-2-codex marked retiring 2026-07-16 - gemini-3-pro marked retired 2026-03-26 with redirect through 2026-06-07 - Several Gemini / Codex endpoints annotated with cross-geo requirement - qwen3-next-80b annotated as Preview OPENING PARAGRAPH: - "available in every workspace" -> "available in supported Model Serving regions"; calls out cross-geo requirement for several endpoints NOT TOUCHED (out of scope: not docs-validatable from supported-models page): - served_entities[].entity_name guidance (line 3 second half) - SKILL.md "system.ai.* catalog" claim on the pay-per-token row These remain as in the a-d-k snapshot and should be revisited if/when docs cover them directly. Test plan: `scripts/skills.py validate` -> "Everything is up to date"; `scripts/skills.py generate` -> only refreshes manifest.json timestamps. Co-authored-by: Isaac

QuentinAmbard · 2026-05-26T13:07:11Z

@jamesbroadhead I suspect this is also coming from main from the content I see? The experimental skill is https://github.com/databricks-solutions/ai-dev-kit/blob/experimental/databricks-skills/databricks-ml-training-serving/SKILL.md
Should I also open another merge PR for suggestion ?

jamesbroadhead · 2026-05-26T15:36:10Z

Hi @QuentinAmbard — Claude here, working with James.

You're right, and I owe you (and the PR description) a correction. I checked both branches:

experimental/databricks-skills/databricks-model-serving/ — does not exist. The skill was renamed/restructured to databricks-ml-training-serving/ on experimental (3 reference files, SKILL.md + 1-custom-pyfunc.md + 2-genai-agents.md).
main/databricks-skills/databricks-model-serving/ — does exist, with 9 reference files including the static FM endpoint catalog this PR ports.

Content-wise it's a clean fingerprint match for main: the chat/instruct + embedding tables (rows like databricks-claude-opus-4-6 | Anthropic | Most capable, 1M context, the "available in every workspace" prose, the table headings) are verbatim from main:databricks-model-serving/SKILL.md and don't appear anywhere on the experimental side. So the PR description's claim of porting from experimental/databricks-skills/databricks-model-serving is wrong — that path was never on experimental.

The bigger issue is philosophical: the experimental databricks-ml-training-serving/SKILL.md deliberately rejects a static catalog:

"Pay-per-token, pre-provisioned in every workspace. New models land regularly and a static skill list goes stale fast — always list at runtime instead of hard-coding names."

It ships databricks serving-endpoints list | jq ... plus runtime-resolved defaults ("highest-numbered Claude Sonnet", "highest-numbered -codex-max") instead of a hard-coded table. So even with the 2026-05-26 doc-validated catalog refresh layered on, what this PR adds is the opposite of the current experimental guidance.

I'll rework this PR to actually align with the experimental skill — replace the static catalog reference file with the runtime-list snippet + the runtime-resolved defaults, and fix the PR description. The doc-validated catalog work isn't wasted; it just shouldn't be how the stable skill steers callers.

On "Should I also open another merge PR for suggestion?" — happy to coordinate. If you mean a PR upstream into a-d-k's experimental to evolve databricks-ml-training-serving further, please go ahead; we'll re-sync into databricks-agent-skills after. If you meant something else, let me know what you had in mind.

…ot static catalog Quentin pointed out (PR #84) that the prior two commits actually ported from `main:databricks-skills/databricks-model-serving/`, not `experimental:databricks-skills/databricks-ml-training-serving/` as the PR description claimed. The two skills take opposite approaches: - `main` ships a static catalog table of FM API endpoint names. - `experimental` deliberately rejects that ("a static skill list goes stale fast — always list at runtime instead of hard-coding names") and ships a `databricks serving-endpoints list | jq ...` one-liner plus runtime-resolved defaults (highest-numbered Claude Sonnet for agents, highest-numbered `-codex-max` for code). Re-port to match the experimental philosophy: - `references/fm-api-endpoints.md`: replace the static catalog with the runtime-list snippet (filtered by `databricks-` name prefix AND `system.ai.*` served entity, to exclude non-FM endpoints sharing the prefix), runtime-resolved family defaults, and CLI + SDK query examples that use a placeholder endpoint name rather than a hard-coded model. - `SKILL.md`: update the Endpoint Types row + the Foundation-Model discovery bullet to reframe the reference as "discover at runtime" rather than "curated table". Version stays at 0.2.0 (frontmatter unchanged → manifest unchanged). The 2026-05-26 catalog refresh in the previous commit is dropped here: the experimental skill's point is that no static table is the right shape, so curating one against docs.databricks.com isn't useful for the stable skill either. Co-authored-by: Isaac

…ental port Previous commit (c148500) restated the experimental section in my own words and added a "Querying" section + provisioned-throughput aside + docs-link gloss that aren't in the upstream skill. The PR's stated goal is to port from experimental — do an actual port, not a paraphrase. `references/fm-api-endpoints.md` now mirrors the `## Foundation Model API endpoints` section of `experimental:databricks-ml-training-serving/SKILL.md` verbatim (heading promoted from `##` to `#` since this is a standalone file): intro paragraph + the `databricks serving-endpoints list | jq ...` one-liner + the family-based default-picking rule. Nothing else. Also trim the SKILL.md discovery bullet back toward its original shape — link to the reference file for the runtime-list snippet, then the same `system.ai` / `serving-endpoints list` / `get-open-api` alternatives that were already there. Co-authored-by: Isaac

…ntal Expands the port from the FM-endpoints-only scope to cover every section of `experimental:databricks-ml-training-serving/`. Mirrors the experimental skill's 3-file structure 1:1 into stable's `references/` directory; the standalone fm-api-endpoints.md added in earlier commits goes away (its content lives inline in training-and-serving.md exactly as it does in experimental's SKILL.md). Added (all verbatim ports, mechanical adjustments only): references/training-and-serving.md Ports experimental SKILL.md content. Mechanical changes only: frontmatter stripped (destination is a reference file, not a SKILL.md); `1-custom-pyfunc.md` → `custom-pyfunc.md`, `2-genai-agents.md` → `genai-agents.md` (filename renames); `../<skill>/SKILL.md` → `../../<skill>/SKILL.md` (one more level of nesting since this file is in references/ rather than at the skill root). Content covers: canonical train/register/serve flow, `mlflow.{sklearn,xgboost,…}.autolog()` patterns, UC alias-based promotion, batch scoring via `spark_udf`, real-time endpoint create + zero-downtime version swap, `state.ready` vs `state.config_update` poll-both gotcha, `jobs submit --no-wait` serverless deploy pattern, Foundation Model API endpoints runtime-list, and the full gotchas trap-table. references/custom-pyfunc.md Ports experimental 1-custom-pyfunc.md verbatim. Mechanical change: `[SKILL.md]` → `[training-and-serving.md]` where the original cross-referenced its parent SKILL.md. Content: file-based PyFunc ("Models from Code"), `infer_signature`, `code_paths`, pre-deploy validation via `mlflow.models.predict(env_manager="uv")`. references/genai-agents.md Ports experimental 2-genai-agents.md verbatim. Mechanical changes: cross-skill paths bumped one level deeper; `[SKILL.md]` → `[training-and-serving.md]`. Content covers: `ResponsesAgent` interface, LangGraph agent with `UCFunctionToolkit` + `VectorSearchRetrieverTool`, the `create_text_output_item` raw-dict-silently-fails gotcha, the `resources=[...]` passthrough-auth list (DatabricksServingEndpoint, DatabricksFunction, DatabricksVectorSearchIndex, DatabricksLakebase), async deploy via `agents.deploy()` from a serverless job, query via CLI and OpenAI-compatible client. Removed: references/fm-api-endpoints.md Standalone file from earlier commits; its content lives inline in training-and-serving.md exactly as it does in experimental's SKILL.md, so the deliberate split is no longer needed. Stable SKILL.md updates (minimal, ops-focus preserved): - FM-endpoint link targets updated from `references/fm-api-endpoints.md` to `references/training-and-serving.md#foundation-model-api-endpoints` in the Endpoint Types table row and the FM-discovery bullet. - New `### Develop & deploy new models` subsection under "What's Next" with a 3-row table pointing at the new dev-side references, framed as "this skill is ops-focused; for the dev-side flow, see below". Manifest regenerated. Co-authored-by: Isaac

- The mechanical `../` → `../../` rewrite in the verbatim port assumed every peer skill is stable, but 4 of them live in `experimental/`. `../../<skill>/SKILL.md` resolved to `skills/<skill>/SKILL.md` which does not exist for `databricks-agent-bricks`, `databricks-mlflow-evaluation`, `databricks-vector-search`, `databricks-unity-catalog`. Repointed to `../../../experimental/<skill>/SKILL.md`. `databricks-jobs` link unchanged (it's stable). - SKILL.md frontmatter `description` only described the ops surface, so agents wouldn't route dev-side asks (train, register, PyFunc, ResponsesAgent) to this skill. Broadened to cover both ops and the new dev surface. - Version bumped 0.2.0 → 0.3.0 + manifest regenerated. Co-authored-by: Isaac

QuentinAmbard

nice let's merge this one, I'll send a followup PR on top!

simonfaltum · 2026-05-27T10:44:33Z

+
+## Deploy (async job, ~15 min)
+
+`databricks.agents.deploy()` blocks for ~15 minutes — don't run it inline from the CLI. Submit as a serverless job so the chat session doesn't hold the connection.


This is great - should we add something about how agents can check if there has already been submitted a serverless job for the deploy?

(Claude here.)

Good call — added in 8c8a1b3. Two cheap checks just before the submit:

databricks jobs list-runs --active-only filtered on run_name == "deploy_<model>" to catch an already-in-flight deploy.

databricks serving-endpoints get <endpoint_name> to skip the redeploy if the endpoint already exists on the right version.

If either hits, the recipe now says to follow the existing run with jobs get-run instead of submitting a new one.

…-phase1 # Conflicts: # manifest.json

@simonfaltum

Per @simonfaltum review: before resubmitting a deploy serverless job, agents should check whether a run is already in flight (active job runs filtered on run_name) or whether the target endpoint already exists in the right state. Avoids wasting ~15 min of serverless and racing for the same endpoint name. Co-authored-by: Isaac

…apx Related Skills entry `databricks-app-apx` was the FastAPI+React stack referenced from ai-dev-kit's `databricks-apps-python` skill. It has been removed upstream (a-d-k is deprecated; the apx-on-CLI flow merged into the stable `databricks-apps` skill via #84/#73). The "Related Skills" bullet is the last dangling reference inside this repo. This PR was prepared by Claude.

QuentinAmbard · 2026-05-28T08:37:04Z

Stacked a follow-up on this in #110 — adds a separate experimental/databricks-ml-training skill for the dev-side content (MLflow autolog, UC registration, custom PyFunc, hand-rolled ResponsesAgent, batch scoring via spark_udf) that complements the model-serving content landed here. Also closes five small gaps where non-obvious serving behavior fell through during the original a-d-k port (MLflow Deployments client tags= top-level + served_model_name derivation, zero-downtime version-swap pattern, two-state-field readiness rationale, classical-ML dataframe_records query shape, Serving-UI "Owned by me" SP filter).

#110 includes the commits from this PR at its base — please merge this one first, then #110 will rebase cleanly onto the new main.

- Drop ../../../experimental/... cross-skill links that 404 when installed (skills install flat under ~/.claude/skills/, not under stable/ vs experimental/). Use plain skill-name references instead. - Replace ai-dev-kit-specific tag examples ("aidevkit_project") with a neutral "project": "demo" so a d-a-s skill doesn't bleed a-d-k convention. - Tighten SKILL.md description from ~870 chars to ~290 chars, matching the convention being established in PR #107. Co-authored-by: Isaac

…-phase1

…LL.md)

jamesbroadhead requested review from a team and simonfaltum May 24, 2026 20:52

jamesbroadhead mentioned this pull request May 24, 2026

databricks-skills: tombstone redirects to databricks-agent-skills databricks-solutions/ai-dev-kit#546

Open

11 tasks

jamesbroadhead marked this pull request as ready for review May 26, 2026 09:26

jamesbroadhead requested a review from lennartkats-db as a code owner May 26, 2026 09:26

jamesbroadhead added 2 commits May 26, 2026 09:47

jamesbroadhead force-pushed the jb/model-serving-port-phase1 branch from c9015d8 to d400eff Compare May 26, 2026 09:47

jamesbroadhead added 2 commits May 26, 2026 15:38

jamesbroadhead changed the title ~~skills(model-serving): port dev-side content from a-d-k (phase 1: FM API endpoints)~~ skills(model-serving): port FM API endpoints section from a-d-k experimental May 26, 2026

jamesbroadhead changed the title ~~skills(model-serving): port FM API endpoints section from a-d-k experimental~~ skills(model-serving): port dev-side content from a-d-k experimental (TODO #1b) May 26, 2026

jamesbroadhead changed the title ~~skills(model-serving): port dev-side content from a-d-k experimental (TODO #1b)~~ skills(model-serving): merge dev-side training/agent flows from a-d-k experimental May 27, 2026

QuentinAmbard approved these changes May 27, 2026

View reviewed changes

simonfaltum reviewed May 27, 2026

View reviewed changes

Merge remote-tracking branch 'origin/main' into jb/model-serving-port…

15d7b4c

…-phase1 # Conflicts: # manifest.json

jamesbroadhead enabled auto-merge (squash) May 27, 2026 11:12

jamesbroadhead requested a review from QuentinAmbard May 27, 2026 14:38

jamesbroadhead mentioned this pull request May 27, 2026

docs(experimental/databricks-apps-python): drop stale databricks-app-apx Related Skills entry #106

Open

3 tasks

QuentinAmbard mentioned this pull request May 28, 2026

skills: split into databricks-model-serving (ops) + databricks-ml-training (experimental) #110

Open

simonfaltum approved these changes May 28, 2026

View reviewed changes

jamesbroadhead added 2 commits May 28, 2026 11:14

Merge remote-tracking branch 'origin/main' into jb/model-serving-port…

937d126

…-phase1

chore: regenerate manifest after main merge (description now from SKI…

77642d6

…LL.md)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skills(model-serving): merge dev-side training/agent flows from a-d-k experimental#84

skills(model-serving): merge dev-side training/agent flows from a-d-k experimental#84
jamesbroadhead wants to merge 11 commits into
mainfrom
jb/model-serving-port-phase1

jamesbroadhead commented May 24, 2026 •

edited

Loading

Uh oh!

QuentinAmbard commented May 26, 2026

Uh oh!

jamesbroadhead commented May 26, 2026

Uh oh!

QuentinAmbard left a comment

Uh oh!

simonfaltum May 27, 2026

Uh oh!

jamesbroadhead May 27, 2026

Uh oh!

QuentinAmbard commented May 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		## Deploy (async job, ~15 min)

		`databricks.agents.deploy()` blocks for ~15 minutes — don't run it inline from the CLI. Submit as a serverless job so the chat session doesn't hold the connection.

Conversation

jamesbroadhead commented May 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Reviewer history

Coverage vs. #73 TODO #1b

Known follow-ups (out of scope)

Test plan

Uh oh!

QuentinAmbard commented May 26, 2026

Uh oh!

jamesbroadhead commented May 26, 2026

Uh oh!

QuentinAmbard left a comment

Choose a reason for hiding this comment

Uh oh!

simonfaltum May 27, 2026

Choose a reason for hiding this comment

Uh oh!

jamesbroadhead May 27, 2026

Choose a reason for hiding this comment

Uh oh!

QuentinAmbard commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jamesbroadhead commented May 24, 2026 •

edited

Loading

QuentinAmbard commented May 28, 2026 •

edited

Loading