From a1afc62eb6466690416b20c22ed1d6f09b25cd4b Mon Sep 17 00:00:00 2001 From: James Broadhead Date: Wed, 27 May 2026 19:10:30 +0000 Subject: [PATCH] docs(databricks-dabs): port DAB content vendored in devhub Adds material that previously lived only in devhub's vendored copy of `declarative-automation-bundles.md`: - `bundle init` workflow + the built-in template list - `bundle generate` for adopting existing resources - `bundle destroy` - Classic shared-cluster job example (`job_clusters` / `job_cluster_key`) - `registered_models` resource example - Consolidated substitutions reference (`${var.*}`, `${bundle.*}`, `${workspace.current_user.*}`, `${resources.*}`) - Troubleshooting entries for "variable shows as literal" and `--debug` for unclear validation errors With this in DAS, devhub can drop its vendored databricks-* skills and rely on `databricks aitools install`. Signed-off-by: James Broadhead --- .../references/bundle-structure.md | 65 +++++++++++++++++++ .../references/deploy-and-run.md | 43 ++++++++++++ 2 files changed, 108 insertions(+) diff --git a/skills/databricks-dabs/references/bundle-structure.md b/skills/databricks-dabs/references/bundle-structure.md index 31772d6..79aa133 100644 --- a/skills/databricks-dabs/references/bundle-structure.md +++ b/skills/databricks-dabs/references/bundle-structure.md @@ -99,6 +99,54 @@ resources: timezone_id: 'America/Los_Angeles' ``` +### Job Resource (Shared Classic Cluster) + +When multiple tasks should reuse the same cluster, declare it once under `job_clusters` and reference it via `job_cluster_key`: + +```yaml +resources: + jobs: + sample_job: + name: sample_job + tasks: + - task_key: notebook_task + notebook_task: + notebook_path: ../src/sample_notebook.ipynb + job_cluster_key: job_cluster + libraries: + - whl: ../dist/*.whl + + - task_key: main_task + depends_on: + - task_key: notebook_task + python_wheel_task: + package_name: my_project + entry_point: main + job_cluster_key: job_cluster + libraries: + - whl: ../dist/*.whl + + job_clusters: + - job_cluster_key: job_cluster + new_cluster: + spark_version: 16.4.x-scala2.12 + node_type_id: i3.xlarge + data_security_mode: SINGLE_USER + autoscale: + min_workers: 1 + max_workers: 4 +``` + +## Registered Model Resources + +```yaml +resources: + registered_models: + customer_churn: + name: '${var.catalog}.${var.schema}.customer_churn_model' + description: 'Customer churn prediction model' +``` + ## Volume Resources ```yaml @@ -177,6 +225,23 @@ databricks bundle generate app DABs supports schemas, models, experiments, clusters, warehouses, etc. Use `databricks bundle schema` to inspect schemas. +## Substitutions + +Substitutions are resolved at deploy time and are usable in any string field across `databricks.yml`, resource files, and variable defaults. + +| Substitution | Resolves to | +| --------------------------------------- | ------------------------------------------------------ | +| `${var.my_variable}` | User-defined variable from `variables:` block | +| `${bundle.name}` | The bundle's `bundle.name` | +| `${bundle.target}` | The active target (`dev`, `staging`, `prod`, …) | +| `${workspace.current_user.userName}` | Deployer's email | +| `${workspace.current_user.short_name}` | Deployer's short name (handle before `@`) | +| `${workspace.file_path}` | Bundle's workspace file path | +| `${resources.jobs..id}` | ID of another job in the same bundle | +| `${resources.pipelines..id}` | ID of another pipeline in the same bundle | + +Variables themselves are declared in `databricks.yml` (with optional `default:` or `lookup:`) and overridden per target. + ## Key Principles 1. **Path resolution**: `../src/` in resources/\*.yml, `./src/` in databricks.yml diff --git a/skills/databricks-dabs/references/deploy-and-run.md b/skills/databricks-dabs/references/deploy-and-run.md index 61aeea3..a1e087f 100644 --- a/skills/databricks-dabs/references/deploy-and-run.md +++ b/skills/databricks-dabs/references/deploy-and-run.md @@ -1,5 +1,39 @@ # Deploy and Run Declarative Automation Bundles +## Initialization + +Start a new bundle interactively: + +```bash +databricks bundle init +``` + +Built-in templates: + +| Template | Use for | +| -------------------- | -------------------------------------------------- | +| `default-python` | Python project with jobs and a pipeline | +| `default-sql` | SQL project with jobs | +| `default-scala` | Scala/Java project | +| `lakeflow-pipelines` | Lakeflow Declarative Pipelines (Python or SQL) | +| `dbt-sql` | dbt integration | +| `default-minimal` | Minimal bundle skeleton | + +Pass a template name or a Git URL pointing at a template directory to skip the interactive picker. + +## Generate from Existing Resources + +If a workspace already has the resource, generate its bundle YAML instead of writing it by hand: + +```bash +databricks bundle generate job +databricks bundle generate pipeline +databricks bundle generate dashboard +databricks bundle generate app +``` + +This writes a resource file under `resources/` plus any referenced source assets. + ## Validation Validate bundle configuration: @@ -32,6 +66,13 @@ Run resources: View status: `bundle summary` +## Destroy + +`bundle destroy` removes everything the bundle previously deployed to the target workspace. It is destructive; confirm the target before running it. + +- `bundle destroy -t dev` +- `bundle destroy -t prod` + ## Monitoring and Logs ```bash @@ -56,3 +97,5 @@ databricks apps logs --profile | **App not starting after deploy** | Apps require `databricks bundle run ` to start | | **App env vars not working** | Environment variables go in `app.yaml` (source dir), not databricks.yml | | **Debugging any app issue** | First step: `databricks apps logs ` | +| **Variable shows as `${var.name}` literal** | Variable not declared in `databricks.yml` `variables:`, missing from the active target, or wrong syntax (use `${var.}`) | +| **Validation errors unclear** | Re-run with `databricks bundle validate --strict --debug` |