Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion skills/databricks-core/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: "databricks-core"
description: "Databricks CLI operations: auth, profiles, data exploration, and bundles. Contains up-to-date guidelines for Databricks-related CLI tasks."
description: "Configure Databricks CLI authentication and profiles, explore catalog/schema/table data, and deploy Databricks Asset Bundles (DABs). Use when the user asks about Databricks CLI commands, authentication setup, workspace configuration, bundle deployment, or data exploration via Databricks."
compatibility: Requires databricks CLI (>= v0.292.0)
metadata:
version: "0.1.0"
Expand Down
80 changes: 55 additions & 25 deletions skills/databricks-dabs/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,69 @@
---
name: databricks-dabs
description: 'Create, configure, validate, deploy, run, and manage DABs — Declarative Automation Bundles (formerly Databricks Asset Bundles) — for Databricks resources including dashboards, jobs, pipelines, alerts, volumes, and apps'
description: "Create, configure, validate, deploy, run, and manage DABs -- Declarative Automation Bundles (formerly Databricks Asset Bundles) -- for Databricks resources including dashboards, jobs, pipelines, alerts, volumes, and apps. Use when the user asks about DABs, Databricks bundles, deploying Databricks resources, or managing bundle configurations."
compatibility: Requires databricks CLI (>= v0.292.0)
metadata:
version: "0.1.0"
---

# Declarative Automation Bundles (DABs)

Use this skill for any bundle-related request including creating, configuring, validating, deploying, running, and managing Databricks resources through DABs.
**FIRST**: Use the parent `databricks-core` skill for CLI basics, authentication, and profile selection.

## Reference Documentation
## Quick-Start Workflow

```bash
# 1. Create a new bundle project
databricks bundle init --profile <PROFILE>

# 2. Configure databricks.yml and resource YAML files
# Resource files: resources/<name>.<resource_type>.yml

# 3. Validate
databricks bundle validate --strict --target <target> --profile <PROFILE>

# 4. Deploy
databricks bundle deploy -t <target> --profile <PROFILE>

The following reference files provide detailed guidance for specific bundle tasks:
# 5. Run a specific resource
databricks bundle run <RESOURCE> -t <target> --profile <PROFILE>
```

- **[Bundle Structure](references/bundle-structure.md)** - Bundle structure, databricks.yml configuration, resource definitions, path resolution, variables, and multi-environment targets
- **[SDP Pipelines](references/sdp-pipelines.md)** - Spark Declarative Pipeline configurations for DABs
- **[SQL Alerts](references/alerts.md)** - SQL Alert schemas and configuration (critical - API differs from other resources)
- **[Deploy and Run](references/deploy-and-run.md)** - Validation, deployment, running resources, monitoring logs, and troubleshooting common issues
- **[Resource Permissions](references/resource-permissions.md)** - Permission levels and access control for bundle resources, per-resource-type levels, grants vs permissions
### Minimal databricks.yml

## When to Use This Skill
```yaml
bundle:
name: my-project

Load this skill for any request involving:
workspace:
host: https://my-workspace.cloud.databricks.com

- Creating new bundle projects or resources
- Configuring databricks.yml or resource YAML files
- Setting up multi-environment deployments (dev/prod targets)
- Deploying or running bundle resources
- Managing permissions for bundle resources
- Troubleshooting bundle validation or deployment errors
- Working with specific resource types (dashboards, jobs, pipelines, alerts, volumes, apps)
variables:
catalog:
default: dev_catalog
schema:
default: my_schema

## General Guidelines
targets:
dev:
default: true
prod:
variables:
catalog: prod_catalog
```

## Guidelines

1. **Always validate after changes** -- `bundle validate --strict --target <target>`
2. **Follow naming conventions** -- Resource files use `<name>.<resource_type>.yml`
3. **Path resolution is critical** -- Paths differ based on file location (see Bundle Structure reference)
4. **Preserve existing structure** -- Keep user comments and structure when editing YAML
5. **Use variables** -- Parameterize catalog, schema, and warehouse for multi-environment support

## Reference Documentation

1. **Always validate after configuration changes** - Use `bundle validate --strict --target <target>` after any change
2. **Use reference documentation** - Consult the appropriate reference file for detailed patterns and examples
3. **Follow naming conventions** - Resource files should use `<name>.<resource_type>.yml` format
4. **Path resolution is critical** - Paths differ based on file location (see Bundle Structure reference)
5. **Preserve existing structure** - Keep user comments and structure when editing YAML files
6. **Use variables** - Parameterize catalog, schema, and warehouse for multi-environment support
- **[Bundle Structure](references/bundle-structure.md)** -- databricks.yml configuration, resource definitions, path resolution, variables, multi-environment targets
- **[SDP Pipelines](references/sdp-pipelines.md)** -- Spark Declarative Pipeline configurations for DABs
- **[SQL Alerts](references/alerts.md)** -- SQL Alert schemas and configuration (API differs from other resources)
- **[Deploy and Run](references/deploy-and-run.md)** -- Validation, deployment, running resources, monitoring, troubleshooting
- **[Resource Permissions](references/resource-permissions.md)** -- Permission levels, access control, grants vs permissions
110 changes: 12 additions & 98 deletions skills/databricks-jobs/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
name: databricks-jobs
description: Develop and deploy Lakeflow Jobs on Databricks. Use when creating data engineering jobs with notebooks, Python wheels, or SQL tasks. Invoke BEFORE starting implementation.
description: "Develop and deploy Lakeflow Jobs on Databricks: create notebook, Python wheel, and SQL tasks, configure schedules and task dependencies, and manage job parameters. Use when creating data engineering jobs with notebooks, Python wheels, or SQL tasks. Invoke BEFORE starting implementation."
compatibility: Requires databricks CLI (>= v0.292.0)
metadata:
version: "0.1.0"
parent: databricks-core
parent: databricks-core
---

# Lakeflow Jobs Development
Expand All @@ -23,29 +23,7 @@ databricks bundle init default-python --config-file <(echo '{"project_name": "my

- `project_name`: letters, numbers, underscores only

After scaffolding, create `CLAUDE.md` and `AGENTS.md` in the project directory. These files are essential to provide agents with guidance on how to work with the project. Use this content:

```
# Declarative Automation Bundles Project

This project uses Declarative Automation Bundles (formerly Databricks Asset Bundles) for deployment.

## Prerequisites

Install the Databricks CLI (>= v0.288.0) if not already installed:
- macOS: `brew tap databricks/tap && brew install databricks`
- Linux: `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh`
- Windows: `winget install Databricks.DatabricksCLI`

Verify: `databricks -v`

## For AI Agents

Read the `databricks-core` skill for CLI basics, authentication, and deployment workflow.
Read the `databricks-jobs` skill for job-specific guidance.

If skills are not available, install them: `databricks experimental aitools install`
```
After scaffolding, create `CLAUDE.md` and `AGENTS.md` pointing agents to the `databricks-core` and `databricks-jobs` skills.

## Project Structure

Expand All @@ -66,89 +44,24 @@ my-job-project/

## Configuring Tasks

Edit `resources/<job_name>.job.yml` to configure tasks:
Edit `resources/<job_name>.job.yml`. Task types: `notebook_task`, `python_wheel_task`, `spark_python_task`, `pipeline_task`, `sql_task`. Use `depends_on` for multi-task DAGs. Job-level `parameters` are passed to ALL tasks (access in notebooks via `dbutils.widgets.get("catalog")`).

```yaml
resources:
jobs:
my_job:
name: my_job

tasks:
- task_key: my_notebook
notebook_task:
notebook_path: ../src/my_notebook.ipynb

- task_key: my_python
depends_on:
- task_key: my_notebook
python_wheel_task:
package_name: my_package
entry_point: main
```

Task types: `notebook_task`, `python_wheel_task`, `spark_python_task`, `pipeline_task`, `sql_task`

## Job Parameters

Parameters defined at job level are passed to ALL tasks (no need to repeat per task):

```yaml
resources:
jobs:
my_job:
parameters:
- name: catalog
default: ${var.catalog}
- name: schema
default: ${var.schema}
```

Access parameters in notebooks with `dbutils.widgets.get("catalog")`.

## Writing Notebook Code

```python
# Read parameters
catalog = dbutils.widgets.get("catalog")
schema = dbutils.widgets.get("schema")

# Read tables
df = spark.read.table(f"{catalog}.{schema}.my_table")

# SQL queries
result = spark.sql(f"SELECT * FROM {catalog}.{schema}.my_table LIMIT 10")

# Write output
df.write.mode("overwrite").saveAsTable(f"{catalog}.{schema}.output_table")
```

## Scheduling

```yaml
resources:
jobs:
my_job:
trigger:
periodic:
interval: 1
unit: DAYS
```

Or with cron:
# Or use cron: schedule: { quartz_cron_expression: "0 0 2 * * ?", timezone_id: "UTC" }

```yaml
schedule:
quartz_cron_expression: "0 0 2 * * ?"
timezone_id: "UTC"
```

## Multi-Task Jobs with Dependencies

```yaml
resources:
jobs:
my_pipeline_job:
tasks:
- task_key: extract
notebook_task:
Expand All @@ -160,11 +73,12 @@ resources:
notebook_task:
notebook_path: ../src/transform.ipynb

- task_key: load
- task_key: load_wheel
depends_on:
- task_key: transform
notebook_task:
notebook_path: ../src/load.ipynb
python_wheel_task:
package_name: my_package
entry_point: main
```

## Unit Testing
Expand All @@ -177,10 +91,10 @@ uv run pytest

## Development Workflow

1. **Validate**: `databricks bundle validate --profile <profile>`
2. **Deploy**: `databricks bundle deploy -t dev --profile <profile>`
1. **Validate**: `databricks bundle validate --profile <profile>` -- fix any YAML or schema errors before proceeding
2. **Deploy**: `databricks bundle deploy -t dev --profile <profile>` -- if `PERMISSION_DENIED`, check workspace permissions and profile
3. **Run**: `databricks bundle run <job_name> -t dev --profile <profile>`
4. **Check run status**: `databricks jobs get-run --run-id <id> --profile <profile>`
4. **Check run status**: `databricks jobs get-run --run-id <id> --profile <profile>` -- if `FAILED`, check `run_page_url` for task-level errors

## Documentation

Expand Down
33 changes: 5 additions & 28 deletions skills/databricks-lakebase/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
name: databricks-lakebase
description: "Databricks Lakebase Postgres: projects, scaling, connectivity, Lakebase synced tables, and Data API. Use when asked about Lakebase databases, OLTP storage, or connecting apps to Postgres on Databricks."
description: "Create and manage Databricks Lakebase Postgres projects, configure scaling and connectivity, set up Lakebase synced tables, and query via the Data API. Use when asked about Lakebase databases, OLTP storage, or connecting apps to Postgres on Databricks."
compatibility: Requires databricks CLI (>= v0.294.0)
metadata:
version: "0.1.0"
parent: databricks-core
parent: databricks-core
---

# Lakebase Postgres Autoscaling
Expand All @@ -17,17 +17,7 @@ Lakebase is Databricks' serverless Postgres-compatible database, available on bo

**Compliance:** Supports HIPAA, C5, TISAX, or None.

## Capabilities

- **Project lifecycle** -- create, update, delete Lakebase Postgres Autoscaling projects
- **Branching** -- copy-on-write branches with TTL, point-in-time recovery, and reset
- **Compute scaling** -- autoscale 0.5--32 CU, fixed 36--112 CU, scale-to-zero
- **High availability** -- 1 primary + 1--3 secondaries, automatic failover
- **PostgreSQL connectivity** -- OAuth token refresh, connection pooling, SSL
- **Data API** -- PostgREST-compatible HTTP CRUD (Autoscaling only)
- **Lakebase synced tables** -- sync Unity Catalog Delta tables into Postgres (previously known as Reverse ETL)
- **Databricks App integration** -- scaffold apps with Lakebase feature, deploy-first workflow
- **Cloud support** -- AWS and Azure (GA)
**Capabilities:** Project lifecycle, copy-on-write branching (TTL, point-in-time recovery), autoscale 0.5--112 CU with scale-to-zero, HA with 1--3 secondaries, OAuth-based Postgres connectivity, PostgREST Data API, Lakebase synced tables (Delta-to-Postgres), Databricks App integration, AWS and Azure (GA).

**Reference docs:**
- [computes-and-scaling.md](references/computes-and-scaling.md) — Sizing, endpoint management, scale-to-zero, HA
Expand Down Expand Up @@ -144,22 +134,9 @@ databricks postgres reset-branch projects/<PROJECT_ID>/branches/<BRANCH_ID> --pr

**Delete:** Protected branches must be unprotected first (`update-branch` to set `spec.is_protected` to `false`). Cannot delete branches with children. **Never delete the `production` branch.**

## Key Differences from Lakebase Provisioned

> All new instances default to Autoscaling as of March 2026. Automatic migration of Provisioned instances begins June 2026.

| Aspect | Provisioned | Autoscaling |
|--------|-------------|-------------|
| CLI group | `databricks database` | `databricks postgres` |
| Top-level resource | Instance | Project |
| Capacity | CU_1--CU_8 (16 GB/CU) | 0.5--112 CU (2 GB/CU) |
| Branching | Not supported | Full support |
| Scale-to-zero | Not supported | Configurable |
| HA | Readable secondaries | 1--3 secondaries + read replicas |
| Data API | Not available | PostgREST HTTP API |
| Cloud | AWS only | AWS and Azure |
## Provisioned vs Autoscaling

**Migration:** Manual via `pg_dump`/`pg_restore` (requires pausing writes). Automatic seamless upgrades (seconds of downtime) begin June 2026 -- no customer action required.
All new instances default to Autoscaling (March 2026). Provisioned uses `databricks database` CLI; Autoscaling uses `databricks postgres`. Autoscaling adds branching, scale-to-zero, Data API, and Azure support. Automatic migration begins June 2026.

## What's Next

Expand Down
28 changes: 3 additions & 25 deletions skills/databricks-pipelines/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
name: databricks-pipelines
description: Develop Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables) on Databricks. Use when building batch or streaming data pipelines with Python or SQL. Invoke BEFORE starting implementation.
description: "Develop Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables) on Databricks: create streaming tables, materialized views, configure data quality expectations, and set up Auto Loader or Auto CDC. Use when building batch or streaming data pipelines with Python or SQL. Invoke BEFORE starting implementation."
compatibility: Requires databricks CLI (>= v0.292.0)
metadata:
version: "0.1.0"
parent: databricks-core
parent: databricks-core
---

# Lakeflow Spark Declarative Pipelines Development
Expand Down Expand Up @@ -181,29 +181,7 @@ databricks bundle init lakeflow-pipelines --config-file <(echo '{"project_name":
- SQL: Recommended for straightforward transformations (filters, joins, aggregations)
- Python: Recommended for complex logic (custom UDFs, ML, advanced processing)

After scaffolding, create `CLAUDE.md` and `AGENTS.md` in the project directory. These files are essential to provide agents with guidance on how to work with the project. Use this content:

```
# Declarative Automation Bundles Project

This project uses Declarative Automation Bundles (formerly Databricks Asset Bundles) for deployment.

## Prerequisites

Install the Databricks CLI (>= v0.288.0) if not already installed:
- macOS: `brew tap databricks/tap && brew install databricks`
- Linux: `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh`
- Windows: `winget install Databricks.DatabricksCLI`

Verify: `databricks -v`

## For AI Agents

Read the `databricks-core` skill for CLI basics, authentication, and deployment workflow.
Read the `databricks-pipelines` skill for pipeline-specific guidance.

If skills are not available, install them: `databricks experimental aitools install`
```
After scaffolding, create `CLAUDE.md` and `AGENTS.md` pointing agents to the `databricks-core` and `databricks-pipelines` skills.

## Pipeline Structure

Expand Down