From dfd0909a533c1ac4ebbdc0b01eb78ce259646b35 Mon Sep 17 00:00:00 2001 From: rohan-tessl Date: Wed, 6 May 2026 10:38:42 +0530 Subject: [PATCH] feat: improve 5 lowest-scoring skill definitions --- skills/databricks-core/SKILL.md | 2 +- skills/databricks-dabs/SKILL.md | 80 +++++++++++++------ skills/databricks-jobs/SKILL.md | 110 +++------------------------ skills/databricks-lakebase/SKILL.md | 33 ++------ skills/databricks-pipelines/SKILL.md | 28 +------ 5 files changed, 76 insertions(+), 177 deletions(-) diff --git a/skills/databricks-core/SKILL.md b/skills/databricks-core/SKILL.md index 4d1abc9..0134f38 100644 --- a/skills/databricks-core/SKILL.md +++ b/skills/databricks-core/SKILL.md @@ -1,6 +1,6 @@ --- name: "databricks-core" -description: "Databricks CLI operations: auth, profiles, data exploration, and bundles. Contains up-to-date guidelines for Databricks-related CLI tasks." +description: "Configure Databricks CLI authentication and profiles, explore catalog/schema/table data, and deploy Databricks Asset Bundles (DABs). Use when the user asks about Databricks CLI commands, authentication setup, workspace configuration, bundle deployment, or data exploration via Databricks." compatibility: Requires databricks CLI (>= v0.292.0) metadata: version: "0.1.0" diff --git a/skills/databricks-dabs/SKILL.md b/skills/databricks-dabs/SKILL.md index 4c91fd2..d3428d1 100644 --- a/skills/databricks-dabs/SKILL.md +++ b/skills/databricks-dabs/SKILL.md @@ -1,39 +1,69 @@ --- name: databricks-dabs -description: 'Create, configure, validate, deploy, run, and manage DABs — Declarative Automation Bundles (formerly Databricks Asset Bundles) — for Databricks resources including dashboards, jobs, pipelines, alerts, volumes, and apps' +description: "Create, configure, validate, deploy, run, and manage DABs -- Declarative Automation Bundles (formerly Databricks Asset Bundles) -- for Databricks resources including dashboards, jobs, pipelines, alerts, volumes, and apps. Use when the user asks about DABs, Databricks bundles, deploying Databricks resources, or managing bundle configurations." +compatibility: Requires databricks CLI (>= v0.292.0) +metadata: + version: "0.1.0" --- # Declarative Automation Bundles (DABs) -Use this skill for any bundle-related request including creating, configuring, validating, deploying, running, and managing Databricks resources through DABs. +**FIRST**: Use the parent `databricks-core` skill for CLI basics, authentication, and profile selection. -## Reference Documentation +## Quick-Start Workflow + +```bash +# 1. Create a new bundle project +databricks bundle init --profile + +# 2. Configure databricks.yml and resource YAML files +# Resource files: resources/..yml + +# 3. Validate +databricks bundle validate --strict --target --profile + +# 4. Deploy +databricks bundle deploy -t --profile -The following reference files provide detailed guidance for specific bundle tasks: +# 5. Run a specific resource +databricks bundle run -t --profile +``` -- **[Bundle Structure](references/bundle-structure.md)** - Bundle structure, databricks.yml configuration, resource definitions, path resolution, variables, and multi-environment targets -- **[SDP Pipelines](references/sdp-pipelines.md)** - Spark Declarative Pipeline configurations for DABs -- **[SQL Alerts](references/alerts.md)** - SQL Alert schemas and configuration (critical - API differs from other resources) -- **[Deploy and Run](references/deploy-and-run.md)** - Validation, deployment, running resources, monitoring logs, and troubleshooting common issues -- **[Resource Permissions](references/resource-permissions.md)** - Permission levels and access control for bundle resources, per-resource-type levels, grants vs permissions +### Minimal databricks.yml -## When to Use This Skill +```yaml +bundle: + name: my-project -Load this skill for any request involving: +workspace: + host: https://my-workspace.cloud.databricks.com -- Creating new bundle projects or resources -- Configuring databricks.yml or resource YAML files -- Setting up multi-environment deployments (dev/prod targets) -- Deploying or running bundle resources -- Managing permissions for bundle resources -- Troubleshooting bundle validation or deployment errors -- Working with specific resource types (dashboards, jobs, pipelines, alerts, volumes, apps) +variables: + catalog: + default: dev_catalog + schema: + default: my_schema -## General Guidelines +targets: + dev: + default: true + prod: + variables: + catalog: prod_catalog +``` + +## Guidelines + +1. **Always validate after changes** -- `bundle validate --strict --target ` +2. **Follow naming conventions** -- Resource files use `..yml` +3. **Path resolution is critical** -- Paths differ based on file location (see Bundle Structure reference) +4. **Preserve existing structure** -- Keep user comments and structure when editing YAML +5. **Use variables** -- Parameterize catalog, schema, and warehouse for multi-environment support + +## Reference Documentation -1. **Always validate after configuration changes** - Use `bundle validate --strict --target ` after any change -2. **Use reference documentation** - Consult the appropriate reference file for detailed patterns and examples -3. **Follow naming conventions** - Resource files should use `..yml` format -4. **Path resolution is critical** - Paths differ based on file location (see Bundle Structure reference) -5. **Preserve existing structure** - Keep user comments and structure when editing YAML files -6. **Use variables** - Parameterize catalog, schema, and warehouse for multi-environment support +- **[Bundle Structure](references/bundle-structure.md)** -- databricks.yml configuration, resource definitions, path resolution, variables, multi-environment targets +- **[SDP Pipelines](references/sdp-pipelines.md)** -- Spark Declarative Pipeline configurations for DABs +- **[SQL Alerts](references/alerts.md)** -- SQL Alert schemas and configuration (API differs from other resources) +- **[Deploy and Run](references/deploy-and-run.md)** -- Validation, deployment, running resources, monitoring, troubleshooting +- **[Resource Permissions](references/resource-permissions.md)** -- Permission levels, access control, grants vs permissions diff --git a/skills/databricks-jobs/SKILL.md b/skills/databricks-jobs/SKILL.md index f9986c1..a33cc5c 100644 --- a/skills/databricks-jobs/SKILL.md +++ b/skills/databricks-jobs/SKILL.md @@ -1,10 +1,10 @@ --- name: databricks-jobs -description: Develop and deploy Lakeflow Jobs on Databricks. Use when creating data engineering jobs with notebooks, Python wheels, or SQL tasks. Invoke BEFORE starting implementation. +description: "Develop and deploy Lakeflow Jobs on Databricks: create notebook, Python wheel, and SQL tasks, configure schedules and task dependencies, and manage job parameters. Use when creating data engineering jobs with notebooks, Python wheels, or SQL tasks. Invoke BEFORE starting implementation." compatibility: Requires databricks CLI (>= v0.292.0) metadata: version: "0.1.0" -parent: databricks-core + parent: databricks-core --- # Lakeflow Jobs Development @@ -23,29 +23,7 @@ databricks bundle init default-python --config-file <(echo '{"project_name": "my - `project_name`: letters, numbers, underscores only -After scaffolding, create `CLAUDE.md` and `AGENTS.md` in the project directory. These files are essential to provide agents with guidance on how to work with the project. Use this content: - -``` -# Declarative Automation Bundles Project - -This project uses Declarative Automation Bundles (formerly Databricks Asset Bundles) for deployment. - -## Prerequisites - -Install the Databricks CLI (>= v0.288.0) if not already installed: -- macOS: `brew tap databricks/tap && brew install databricks` -- Linux: `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh` -- Windows: `winget install Databricks.DatabricksCLI` - -Verify: `databricks -v` - -## For AI Agents - -Read the `databricks-core` skill for CLI basics, authentication, and deployment workflow. -Read the `databricks-jobs` skill for job-specific guidance. - -If skills are not available, install them: `databricks experimental aitools install` -``` +After scaffolding, create `CLAUDE.md` and `AGENTS.md` pointing agents to the `databricks-core` and `databricks-jobs` skills. ## Project Structure @@ -66,89 +44,24 @@ my-job-project/ ## Configuring Tasks -Edit `resources/.job.yml` to configure tasks: +Edit `resources/.job.yml`. Task types: `notebook_task`, `python_wheel_task`, `spark_python_task`, `pipeline_task`, `sql_task`. Use `depends_on` for multi-task DAGs. Job-level `parameters` are passed to ALL tasks (access in notebooks via `dbutils.widgets.get("catalog")`). ```yaml resources: jobs: my_job: name: my_job - - tasks: - - task_key: my_notebook - notebook_task: - notebook_path: ../src/my_notebook.ipynb - - - task_key: my_python - depends_on: - - task_key: my_notebook - python_wheel_task: - package_name: my_package - entry_point: main -``` - -Task types: `notebook_task`, `python_wheel_task`, `spark_python_task`, `pipeline_task`, `sql_task` - -## Job Parameters - -Parameters defined at job level are passed to ALL tasks (no need to repeat per task): - -```yaml -resources: - jobs: - my_job: parameters: - name: catalog default: ${var.catalog} - name: schema default: ${var.schema} -``` - -Access parameters in notebooks with `dbutils.widgets.get("catalog")`. - -## Writing Notebook Code - -```python -# Read parameters -catalog = dbutils.widgets.get("catalog") -schema = dbutils.widgets.get("schema") - -# Read tables -df = spark.read.table(f"{catalog}.{schema}.my_table") - -# SQL queries -result = spark.sql(f"SELECT * FROM {catalog}.{schema}.my_table LIMIT 10") - -# Write output -df.write.mode("overwrite").saveAsTable(f"{catalog}.{schema}.output_table") -``` - -## Scheduling - -```yaml -resources: - jobs: - my_job: trigger: periodic: interval: 1 unit: DAYS -``` - -Or with cron: + # Or use cron: schedule: { quartz_cron_expression: "0 0 2 * * ?", timezone_id: "UTC" } -```yaml - schedule: - quartz_cron_expression: "0 0 2 * * ?" - timezone_id: "UTC" -``` - -## Multi-Task Jobs with Dependencies - -```yaml -resources: - jobs: - my_pipeline_job: tasks: - task_key: extract notebook_task: @@ -160,11 +73,12 @@ resources: notebook_task: notebook_path: ../src/transform.ipynb - - task_key: load + - task_key: load_wheel depends_on: - task_key: transform - notebook_task: - notebook_path: ../src/load.ipynb + python_wheel_task: + package_name: my_package + entry_point: main ``` ## Unit Testing @@ -177,10 +91,10 @@ uv run pytest ## Development Workflow -1. **Validate**: `databricks bundle validate --profile ` -2. **Deploy**: `databricks bundle deploy -t dev --profile ` +1. **Validate**: `databricks bundle validate --profile ` -- fix any YAML or schema errors before proceeding +2. **Deploy**: `databricks bundle deploy -t dev --profile ` -- if `PERMISSION_DENIED`, check workspace permissions and profile 3. **Run**: `databricks bundle run -t dev --profile ` -4. **Check run status**: `databricks jobs get-run --run-id --profile ` +4. **Check run status**: `databricks jobs get-run --run-id --profile ` -- if `FAILED`, check `run_page_url` for task-level errors ## Documentation diff --git a/skills/databricks-lakebase/SKILL.md b/skills/databricks-lakebase/SKILL.md index fba9381..cbcd9b1 100644 --- a/skills/databricks-lakebase/SKILL.md +++ b/skills/databricks-lakebase/SKILL.md @@ -1,10 +1,10 @@ --- name: databricks-lakebase -description: "Databricks Lakebase Postgres: projects, scaling, connectivity, Lakebase synced tables, and Data API. Use when asked about Lakebase databases, OLTP storage, or connecting apps to Postgres on Databricks." +description: "Create and manage Databricks Lakebase Postgres projects, configure scaling and connectivity, set up Lakebase synced tables, and query via the Data API. Use when asked about Lakebase databases, OLTP storage, or connecting apps to Postgres on Databricks." compatibility: Requires databricks CLI (>= v0.294.0) metadata: version: "0.1.0" -parent: databricks-core + parent: databricks-core --- # Lakebase Postgres Autoscaling @@ -17,17 +17,7 @@ Lakebase is Databricks' serverless Postgres-compatible database, available on bo **Compliance:** Supports HIPAA, C5, TISAX, or None. -## Capabilities - -- **Project lifecycle** -- create, update, delete Lakebase Postgres Autoscaling projects -- **Branching** -- copy-on-write branches with TTL, point-in-time recovery, and reset -- **Compute scaling** -- autoscale 0.5--32 CU, fixed 36--112 CU, scale-to-zero -- **High availability** -- 1 primary + 1--3 secondaries, automatic failover -- **PostgreSQL connectivity** -- OAuth token refresh, connection pooling, SSL -- **Data API** -- PostgREST-compatible HTTP CRUD (Autoscaling only) -- **Lakebase synced tables** -- sync Unity Catalog Delta tables into Postgres (previously known as Reverse ETL) -- **Databricks App integration** -- scaffold apps with Lakebase feature, deploy-first workflow -- **Cloud support** -- AWS and Azure (GA) +**Capabilities:** Project lifecycle, copy-on-write branching (TTL, point-in-time recovery), autoscale 0.5--112 CU with scale-to-zero, HA with 1--3 secondaries, OAuth-based Postgres connectivity, PostgREST Data API, Lakebase synced tables (Delta-to-Postgres), Databricks App integration, AWS and Azure (GA). **Reference docs:** - [computes-and-scaling.md](references/computes-and-scaling.md) — Sizing, endpoint management, scale-to-zero, HA @@ -144,22 +134,9 @@ databricks postgres reset-branch projects//branches/ --pr **Delete:** Protected branches must be unprotected first (`update-branch` to set `spec.is_protected` to `false`). Cannot delete branches with children. **Never delete the `production` branch.** -## Key Differences from Lakebase Provisioned - -> All new instances default to Autoscaling as of March 2026. Automatic migration of Provisioned instances begins June 2026. - -| Aspect | Provisioned | Autoscaling | -|--------|-------------|-------------| -| CLI group | `databricks database` | `databricks postgres` | -| Top-level resource | Instance | Project | -| Capacity | CU_1--CU_8 (16 GB/CU) | 0.5--112 CU (2 GB/CU) | -| Branching | Not supported | Full support | -| Scale-to-zero | Not supported | Configurable | -| HA | Readable secondaries | 1--3 secondaries + read replicas | -| Data API | Not available | PostgREST HTTP API | -| Cloud | AWS only | AWS and Azure | +## Provisioned vs Autoscaling -**Migration:** Manual via `pg_dump`/`pg_restore` (requires pausing writes). Automatic seamless upgrades (seconds of downtime) begin June 2026 -- no customer action required. +All new instances default to Autoscaling (March 2026). Provisioned uses `databricks database` CLI; Autoscaling uses `databricks postgres`. Autoscaling adds branching, scale-to-zero, Data API, and Azure support. Automatic migration begins June 2026. ## What's Next diff --git a/skills/databricks-pipelines/SKILL.md b/skills/databricks-pipelines/SKILL.md index d08d0b1..b843d33 100644 --- a/skills/databricks-pipelines/SKILL.md +++ b/skills/databricks-pipelines/SKILL.md @@ -1,10 +1,10 @@ --- name: databricks-pipelines -description: Develop Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables) on Databricks. Use when building batch or streaming data pipelines with Python or SQL. Invoke BEFORE starting implementation. +description: "Develop Lakeflow Spark Declarative Pipelines (formerly Delta Live Tables) on Databricks: create streaming tables, materialized views, configure data quality expectations, and set up Auto Loader or Auto CDC. Use when building batch or streaming data pipelines with Python or SQL. Invoke BEFORE starting implementation." compatibility: Requires databricks CLI (>= v0.292.0) metadata: version: "0.1.0" -parent: databricks-core + parent: databricks-core --- # Lakeflow Spark Declarative Pipelines Development @@ -181,29 +181,7 @@ databricks bundle init lakeflow-pipelines --config-file <(echo '{"project_name": - SQL: Recommended for straightforward transformations (filters, joins, aggregations) - Python: Recommended for complex logic (custom UDFs, ML, advanced processing) -After scaffolding, create `CLAUDE.md` and `AGENTS.md` in the project directory. These files are essential to provide agents with guidance on how to work with the project. Use this content: - -``` -# Declarative Automation Bundles Project - -This project uses Declarative Automation Bundles (formerly Databricks Asset Bundles) for deployment. - -## Prerequisites - -Install the Databricks CLI (>= v0.288.0) if not already installed: -- macOS: `brew tap databricks/tap && brew install databricks` -- Linux: `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh` -- Windows: `winget install Databricks.DatabricksCLI` - -Verify: `databricks -v` - -## For AI Agents - -Read the `databricks-core` skill for CLI basics, authentication, and deployment workflow. -Read the `databricks-pipelines` skill for pipeline-specific guidance. - -If skills are not available, install them: `databricks experimental aitools install` -``` +After scaffolding, create `CLAUDE.md` and `AGENTS.md` pointing agents to the `databricks-core` and `databricks-pipelines` skills. ## Pipeline Structure