Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 6 additions & 3 deletions manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -249,16 +249,19 @@
"version": "0.0.1"
},
"databricks-model-serving": {
"description": "Manage Databricks Model Serving endpoints via CLI.",
"description": "Databricks Model Serving (ops) plus MLflow model development (dev): manage serving endpoints, train and register models to Unity Catalog with @prod aliases, batch-score via spark_udf, build custom...",
"files": [
"SKILL.md",
"agents/openai.yaml",
"assets/databricks.png",
"assets/databricks.svg",
"references/off-platform-streaming.md"
"references/custom-pyfunc.md",
"references/genai-agents.md",
"references/off-platform-streaming.md",
"references/training-and-serving.md"
],
"repo_dir": "skills",
"version": "0.1.0"
"version": "0.3.0"
},
"databricks-pipelines": {
"description": "Develop Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables) on Databricks.",
Expand Down
18 changes: 14 additions & 4 deletions skills/databricks-model-serving/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
name: databricks-model-serving
description: "Manage Databricks Model Serving endpoints via CLI. Use when asked to create, configure, query, or manage model serving endpoints for LLM inference, custom models, or external models."
description: "Databricks Model Serving (ops) plus MLflow model development (dev): manage serving endpoints, train and register models to Unity Catalog with @prod aliases, batch-score via spark_udf, build custom PyFunc / ResponsesAgent models, and discover Foundation Model API endpoints."
compatibility: Requires databricks CLI (>= v0.294.0)
metadata:
version: "0.1.0"
version: "0.3.0"
parent: databricks-core
---

Expand All @@ -17,7 +17,7 @@ Model Serving provides managed endpoints for serving LLMs, custom ML models, and

| Type | When to Use | Key Detail |
|------|-------------|------------|
| Pay-per-token | Foundation Model APIs (Llama, DBRX, etc.) | Uses `system.ai.*` catalog models, simplest setup |
| Pay-per-token | Foundation Model APIs (Llama, GPT-5, Claude, Gemini, etc.) | Uses `system.ai.*` catalog models, simplest setup. Discover endpoints at runtime — see [references/training-and-serving.md § Foundation Model API endpoints](references/training-and-serving.md#foundation-model-api-endpoints). |
| Provisioned throughput | Dedicated GPU capacity | Guaranteed throughput, higher cost |
| Custom model | Your own MLflow models or containers | Deploy any model with an MLflow signature |

Expand Down Expand Up @@ -74,7 +74,7 @@ databricks serving-endpoints create <ENDPOINT_NAME> \
}' --profile <PROFILE>
```

- Discover available Foundation Models: check the `system.ai` catalog in Unity Catalog, or use `databricks serving-endpoints list --profile <PROFILE>` to see available endpoints. Use `databricks serving-endpoints get-open-api <ENDPOINT_NAME> --profile <PROFILE>` to inspect the endpoint's API schema.
- Discover available Foundation Models: see [references/training-and-serving.md § Foundation Model API endpoints](references/training-and-serving.md#foundation-model-api-endpoints) for the runtime-list snippet and default-picking rules. You can also check the `system.ai` catalog in Unity Catalog, or run `databricks serving-endpoints list --profile <PROFILE>` to see what's deployed in the workspace. Use `databricks serving-endpoints get-open-api <ENDPOINT_NAME> --profile <PROFILE>` to inspect a specific endpoint's API schema.
- Long-running operation; the CLI waits for completion by default. Use `--no-wait` to return immediately, then poll:
```bash
databricks serving-endpoints get <ENDPOINT_NAME> --profile <PROFILE>
Expand Down Expand Up @@ -177,6 +177,16 @@ env:

Then add a tRPC route to call it from your app. For the full app integration pattern, use the **`databricks-apps`** skill and read the [Model Serving Guide](../databricks-apps/references/appkit/model-serving.md).

### Develop & deploy new models

This skill is ops-focused (manage existing endpoints). For the dev-side flow — train a model, register to Unity Catalog, log a PyFunc or `ResponsesAgent`, deploy — see the references below.

| Reference | When to read |
|---|---|
| [references/training-and-serving.md](references/training-and-serving.md) | Train + register classical ML with `mlflow.autolog`, alias-based promotion (`@prod`), batch scoring via `spark_udf`, real-time endpoint create + zero-downtime version swap, async deploy via `jobs submit --no-wait`. Includes the Foundation Model API endpoints runtime-list and the gotchas table. |
| [references/custom-pyfunc.md](references/custom-pyfunc.md) | When `autolog` isn't enough — file-based `PythonModel` ("Models from Code"), `infer_signature`, `code_paths`, pre-deploy validation with `mlflow.models.predict(env_manager="uv")`. |
| [references/genai-agents.md](references/genai-agents.md) | Hand-rolled `ResponsesAgent` with LangGraph + `UCFunctionToolkit` + `VectorSearchRetrieverTool`. Includes the `create_text_output_item` helper-method gotcha and the `resources=[...]` passthrough-auth list. |

## Troubleshooting

| Error | Solution |
Expand Down
106 changes: 106 additions & 0 deletions skills/databricks-model-serving/references/custom-pyfunc.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# Custom pyfunc model

When sklearn / XGBoost autolog isn't enough: custom preprocessing not captured by a sklearn pipeline, multiple sub-models behind one endpoint, external API calls during inference, business-logic-heavy post-processing.

Same UC registry + serving story as classical ML — only the *logging* step changes.

## End-to-end example: file-based pyfunc with preprocessing + sub-model

Project layout:

```
my_model/
├── model.py # PythonModel + mlflow.models.set_model(...)
├── log_model.py # Logs + registers to UC
└── artifacts/
├── preprocessor.pkl
└── booster.json
```

```python
# model.py — logged verbatim via python_model="model.py" (Models from Code).
# DO NOT pickle a class instance; use this file-path pattern instead.
import json, pickle, pandas as pd
import mlflow
from mlflow.pyfunc import PythonModel

class TurbineRiskModel(PythonModel):
def load_context(self, context):
with open(context.artifacts["preprocessor"], "rb") as f:
self.pre = pickle.load(f)
from xgboost import Booster
self.booster = Booster()
self.booster.load_model(context.artifacts["booster"])

def predict(self, context, model_input: pd.DataFrame, params=None) -> pd.DataFrame:
X = self.pre.transform(model_input)
proba = self.booster.predict(X)
return pd.DataFrame({
"risk_score": proba,
"risk_level": ["HIGH" if p > 0.7 else "MEDIUM" if p > 0.4 else "LOW" for p in proba],
})

mlflow.models.set_model(TurbineRiskModel())
```

```python
# log_model.py
import mlflow
from mlflow.models import infer_signature
from mlflow.tracking import MlflowClient

mlflow.set_registry_uri("databricks-uc")
mlflow.set_experiment("/Users/me@example.com/turbine_risk")

CATALOG, SCHEMA, NAME = "ai_demo_gen", "wind_farm", "turbine_risk"
FULL_NAME = f"{CATALOG}.{SCHEMA}.{NAME}"

sample_input = pd.DataFrame({"vib_rms": [0.4], "rpm_mean": [18.2], "bearing_temp_max": [71.3]})
sample_output = pd.DataFrame({"risk_score": [0.0], "risk_level": ["LOW"]})

with mlflow.start_run():
info = mlflow.pyfunc.log_model(
name="model",
python_model="model.py", # file path, not an instance
artifacts={
"preprocessor": "artifacts/preprocessor.pkl",
"booster": "artifacts/booster.json",
},
signature=infer_signature(sample_input, sample_output),
input_example=sample_input,
# Pin exact versions — endpoint rebuilds the env from these:
pip_requirements=["mlflow==2.22.0", "xgboost==2.1.3", "scikit-learn==1.5.2", "pandas"],
# Extra modules to ship with the model (e.g. shared util libs):
# code_paths=["src/utils.py"],
registered_model_name=FULL_NAME,
)

# Pre-deploy validation — rebuilds the env locally and runs predict().
# Catches missing deps / signature drift BEFORE the endpoint does.
mlflow.models.predict(
model_uri=info.model_uri,
input_data=sample_input,
env_manager="uv", # MLflow ≥ 2.22; falls back to "virtualenv" otherwise
)

# Promote to @prod
client = MlflowClient(registry_uri="databricks-uc")
v = max(client.search_model_versions(f"name='{FULL_NAME}'"), key=lambda x: int(x.version)).version
client.set_registered_model_alias(FULL_NAME, "prod", v)
```

**Why `python_model="model.py"`**: file logged verbatim, no class pickling — avoids Python-version unpickle crashes between training and serving runtimes. Pair with `code_paths=[...]` to ship companion modules; `mlflow.models.set_model(instance)` at end of file is the contract (exactly one call).

## Consume

Same two paths as autologged classical ML — see [training-and-serving.md](training-and-serving.md#consume-batch-scoring-over-delta).

- **Batch**: `mlflow.pyfunc.spark_udf(spark, model_uri=f"models:/{FULL_NAME}@prod", env_manager="local")` over a Delta table.
- **Real-time**: `client.create_endpoint(...)` (see training-and-serving.md). Query returns a DataFrame-shaped JSON since `predict` returns a DataFrame.

```bash
databricks serving-endpoints query turbine-risk-endpoint --json '{
"dataframe_records": [{"vib_rms": 0.6, "rpm_mean": 19.0, "bearing_temp_max": 78.0}]
}'
# → {"predictions": [{"risk_score": 0.82, "risk_level": "HIGH"}]}
```
Loading
Loading