databricks · jamesbroadhead · May 24, 2026 · May 26, 2026 · May 26, 2026 · May 26, 2026
@@ -249,16 +249,19 @@
       "version": "0.0.1"
     },
     "databricks-model-serving": {
-      "description": "Manage Databricks Model Serving endpoints via CLI.",
+      "description": "Databricks Model Serving (ops) plus MLflow model development (dev): manage serving endpoints, train and register models to Unity Catalog with @prod aliases, batch-score via spark_udf, build custom...",
       "files": [
         "SKILL.md",
         "agents/openai.yaml",
         "assets/databricks.png",
         "assets/databricks.svg",
-        "references/off-platform-streaming.md"
+        "references/custom-pyfunc.md",
+        "references/genai-agents.md",
+        "references/off-platform-streaming.md",
+        "references/training-and-serving.md"
       ],
       "repo_dir": "skills",
-      "version": "0.1.0"
+      "version": "0.3.0"
     },
     "databricks-pipelines": {
       "description": "Develop Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables) on Databricks.",

@@ -1,9 +1,9 @@
 ---
 name: databricks-model-serving
-description: "Manage Databricks Model Serving endpoints via CLI. Use when asked to create, configure, query, or manage model serving endpoints for LLM inference, custom models, or external models."
+description: "Databricks Model Serving (ops) plus MLflow model development (dev): manage serving endpoints, train and register models to Unity Catalog with @prod aliases, batch-score via spark_udf, build custom PyFunc / ResponsesAgent models, and discover Foundation Model API endpoints."
 compatibility: Requires databricks CLI (>= v0.294.0)
 metadata:
-  version: "0.1.0"
+  version: "0.3.0"
 parent: databricks-core
 ---
 
@@ -17,7 +17,7 @@ Model Serving provides managed endpoints for serving LLMs, custom ML models, and
 
 | Type | When to Use | Key Detail |
 |------|-------------|------------|
-| Pay-per-token | Foundation Model APIs (Llama, DBRX, etc.) | Uses `system.ai.*` catalog models, simplest setup |
+| Pay-per-token | Foundation Model APIs (Llama, GPT-5, Claude, Gemini, etc.) | Uses `system.ai.*` catalog models, simplest setup. Discover endpoints at runtime — see [references/training-and-serving.md § Foundation Model API endpoints](references/training-and-serving.md#foundation-model-api-endpoints). |
 | Provisioned throughput | Dedicated GPU capacity | Guaranteed throughput, higher cost |
 | Custom model | Your own MLflow models or containers | Deploy any model with an MLflow signature |
 
@@ -74,7 +74,7 @@ databricks serving-endpoints create <ENDPOINT_NAME> \
   }' --profile <PROFILE>
 ```
 
-- Discover available Foundation Models: check the `system.ai` catalog in Unity Catalog, or use `databricks serving-endpoints list --profile <PROFILE>` to see available endpoints. Use `databricks serving-endpoints get-open-api <ENDPOINT_NAME> --profile <PROFILE>` to inspect the endpoint's API schema.
+- Discover available Foundation Models: see [references/training-and-serving.md § Foundation Model API endpoints](references/training-and-serving.md#foundation-model-api-endpoints) for the runtime-list snippet and default-picking rules. You can also check the `system.ai` catalog in Unity Catalog, or run `databricks serving-endpoints list --profile <PROFILE>` to see what's deployed in the workspace. Use `databricks serving-endpoints get-open-api <ENDPOINT_NAME> --profile <PROFILE>` to inspect a specific endpoint's API schema.
 - Long-running operation; the CLI waits for completion by default. Use `--no-wait` to return immediately, then poll:
   ```bash
   databricks serving-endpoints get <ENDPOINT_NAME> --profile <PROFILE>
@@ -177,6 +177,16 @@ env:
 
 Then add a tRPC route to call it from your app. For the full app integration pattern, use the **`databricks-apps`** skill and read the [Model Serving Guide](../databricks-apps/references/appkit/model-serving.md).
 
+### Develop & deploy new models
+
+This skill is ops-focused (manage existing endpoints). For the dev-side flow — train a model, register to Unity Catalog, log a PyFunc or `ResponsesAgent`, deploy — see the references below.
+
+| Reference | When to read |
+|---|---|
+| [references/training-and-serving.md](references/training-and-serving.md) | Train + register classical ML with `mlflow.autolog`, alias-based promotion (`@prod`), batch scoring via `spark_udf`, real-time endpoint create + zero-downtime version swap, async deploy via `jobs submit --no-wait`. Includes the Foundation Model API endpoints runtime-list and the gotchas table. |
+| [references/custom-pyfunc.md](references/custom-pyfunc.md) | When `autolog` isn't enough — file-based `PythonModel` ("Models from Code"), `infer_signature`, `code_paths`, pre-deploy validation with `mlflow.models.predict(env_manager="uv")`. |
+| [references/genai-agents.md](references/genai-agents.md) | Hand-rolled `ResponsesAgent` with LangGraph + `UCFunctionToolkit` + `VectorSearchRetrieverTool`. Includes the `create_text_output_item` helper-method gotcha and the `resources=[...]` passthrough-auth list. |
+
 ## Troubleshooting
 
 | Error | Solution |

@@ -0,0 +1,106 @@
+# Custom pyfunc model
+
+When sklearn / XGBoost autolog isn't enough: custom preprocessing not captured by a sklearn pipeline, multiple sub-models behind one endpoint, external API calls during inference, business-logic-heavy post-processing.
+
+Same UC registry + serving story as classical ML — only the *logging* step changes.
+
+## End-to-end example: file-based pyfunc with preprocessing + sub-model
+
+Project layout:
+
+```
+my_model/
+├── model.py        # PythonModel + mlflow.models.set_model(...)
+├── log_model.py    # Logs + registers to UC
+└── artifacts/
+    ├── preprocessor.pkl
+    └── booster.json
+```
+
+```python
+# model.py — logged verbatim via python_model="model.py" (Models from Code).
+# DO NOT pickle a class instance; use this file-path pattern instead.
+import json, pickle, pandas as pd
+import mlflow
+from mlflow.pyfunc import PythonModel
+
+class TurbineRiskModel(PythonModel):
+    def load_context(self, context):
+        with open(context.artifacts["preprocessor"], "rb") as f:
+            self.pre = pickle.load(f)
+        from xgboost import Booster
+        self.booster = Booster()
+        self.booster.load_model(context.artifacts["booster"])
+
+    def predict(self, context, model_input: pd.DataFrame, params=None) -> pd.DataFrame:
+        X = self.pre.transform(model_input)
+        proba = self.booster.predict(X)
+        return pd.DataFrame({
+            "risk_score": proba,
+            "risk_level": ["HIGH" if p > 0.7 else "MEDIUM" if p > 0.4 else "LOW" for p in proba],
+        })
+
+mlflow.models.set_model(TurbineRiskModel())
+```
+
+```python
+# log_model.py
+import mlflow
+from mlflow.models import infer_signature
+from mlflow.tracking import MlflowClient
+
+mlflow.set_registry_uri("databricks-uc")
+mlflow.set_experiment("/Users/me@example.com/turbine_risk")
+
+CATALOG, SCHEMA, NAME = "ai_demo_gen", "wind_farm", "turbine_risk"
+FULL_NAME = f"{CATALOG}.{SCHEMA}.{NAME}"
+
+sample_input = pd.DataFrame({"vib_rms": [0.4], "rpm_mean": [18.2], "bearing_temp_max": [71.3]})
+sample_output = pd.DataFrame({"risk_score": [0.0], "risk_level": ["LOW"]})
+
+with mlflow.start_run():
+    info = mlflow.pyfunc.log_model(
+        name="model",
+        python_model="model.py",           # file path, not an instance
+        artifacts={
+            "preprocessor": "artifacts/preprocessor.pkl",
+            "booster":      "artifacts/booster.json",
+        },
+        signature=infer_signature(sample_input, sample_output),
+        input_example=sample_input,
+        # Pin exact versions — endpoint rebuilds the env from these:
+        pip_requirements=["mlflow==2.22.0", "xgboost==2.1.3", "scikit-learn==1.5.2", "pandas"],
+        # Extra modules to ship with the model (e.g. shared util libs):
+        # code_paths=["src/utils.py"],
+        registered_model_name=FULL_NAME,
+    )
+
+# Pre-deploy validation — rebuilds the env locally and runs predict().
+# Catches missing deps / signature drift BEFORE the endpoint does.
+mlflow.models.predict(
+    model_uri=info.model_uri,
+    input_data=sample_input,
+    env_manager="uv",   # MLflow ≥ 2.22; falls back to "virtualenv" otherwise
+)
+
+# Promote to @prod
+client = MlflowClient(registry_uri="databricks-uc")
+v = max(client.search_model_versions(f"name='{FULL_NAME}'"), key=lambda x: int(x.version)).version
+client.set_registered_model_alias(FULL_NAME, "prod", v)
+```
+
+**Why `python_model="model.py"`**: file logged verbatim, no class pickling — avoids Python-version unpickle crashes between training and serving runtimes. Pair with `code_paths=[...]` to ship companion modules; `mlflow.models.set_model(instance)` at end of file is the contract (exactly one call).
+
+## Consume
+
+Same two paths as autologged classical ML — see [training-and-serving.md](training-and-serving.md#consume-batch-scoring-over-delta).
+
+- **Batch**: `mlflow.pyfunc.spark_udf(spark, model_uri=f"models:/{FULL_NAME}@prod", env_manager="local")` over a Delta table.
+- **Real-time**: `client.create_endpoint(...)` (see training-and-serving.md). Query returns a DataFrame-shaped JSON since `predict` returns a DataFrame.
+
+```bash
+databricks serving-endpoints query turbine-risk-endpoint --json '{
+  "dataframe_records": [{"vib_rms": 0.6, "rpm_mean": 19.0, "bearing_temp_max": 78.0}]
+}'
+# → {"predictions": [{"risk_score": 0.82, "risk_level": "HIGH"}]}
+```