skills(execution-compute): drop bundled compute.py, document the CLI surface#90
skills(execution-compute): drop bundled compute.py, document the CLI surface#90jamesbroadhead wants to merge 5 commits into
Conversation
…surface Per Lennart's audit on #73 (item #7): "for getting this out of experimental I think we need to just rely on the Databricks CLI." Going ahead and doing this now rather than waiting for stable promotion — the CLI already covers everything compute.py wrapped, just verbosely for the cluster-code-execution path. Removed: experimental/databricks-execution-compute/scripts/compute.py (743 lines, three subcommands). Replacement mapping: - `compute.py list-compute` (clusters/node-types/spark-versions) → `databricks clusters list/list-node-types/spark-versions` - `compute.py manage-cluster --action ...` → `databricks clusters create/start/restart/delete/permanent-delete/resize` - `compute.py execute-code --compute-type serverless` (upload+submit+ wait+cleanup) → existing `databricks workspace import` + `databricks jobs submit [--no-wait]` + `databricks jobs get-run` + `databricks jobs get-run-output` flow (already documented in SKILL.md / 2-serverless-job.md; the "Convenience wrapper" section that pointed at compute.py is removed). - `compute.py execute-code --compute-type cluster` (code execution on a running cluster with persisted context) → multi-step `databricks api post /api/1.2/contexts/create` → `/api/1.2/commands/execute` → poll `/api/1.2/commands/status` → optional `/api/1.2/contexts/destroy`. The full recipe (create context, submit, poll, fetch results, follow-up commands reusing contextId, %pip install, language switching) is in references/3-interactive-cluster.md. Side effects: - Removed the `<SKILL_ROOT>` path-convention note from SKILL.md (was only there to reference scripts/compute.py paths). - Added a "CLI Command Map" table to SKILL.md replacing the three-row compute.py table — now covers upload/submit/run-state/ output, cluster list/get/start/stop/create, code-execution-on-cluster, and warehouses. Manifest regenerated. `python3 scripts/skills.py validate` passes. Co-authored-by: Isaac
| Run code on Databricks. Three execution modes—choose based on workload. | ||
|
|
||
| > **Path convention:** `<SKILL_ROOT>` in examples below = the directory containing this SKILL.md. Resolve it to the absolute path in your install (e.g. `~/.claude/skills/databricks-execution-compute`). Commands like `python <SKILL_ROOT>/scripts/compute.py ...` work from any cwd. | ||
| Run code on Databricks. Three execution modes—choose based on workload. All examples below use the Databricks CLI; install with `pip install databricks-cli` (or follow the workspace-native `databricks` quickstart) and authenticate with `databricks auth login` (see the parent `databricks-core` skill for profile setup). |
There was a problem hiding this comment.
This installation instruction is for the legacy CLI; we don't use pip anymore
There was a problem hiding this comment.
(Claude here, on James's behalf.)
Fixed in 3d4d88b — dropped the pip install databricks-cli reference and now point readers at the databricks-core skill for install + auth.
Also did a wider grep — found one more carryover in skills/databricks-pipelines/references/workflows.md (came in from the a-d-k port in #85, originally databricks-skills/databricks-spark-declarative-pipelines/references/1-project-initialization.md:418 upstream). Fixed in the same commit. There's also a stale error message in ai-dev-kit's databricks-mcp-server/databricks_mcp_server/tools/workspace.py:211 recommending the same legacy package — that one's out of scope here but worth flagging upstream.
| | Dependencies listed but not installed | `"client": "1"` silently drops `dependencies`; use `"client": "4"` | | ||
| | `get-run-output` returns empty `notebook_output` | You passed the parent run_id, not `.tasks[0].run_id` | | ||
| | Job times out | Default 1800 s on the script wrapper; raise `--timeout` or use `jobs submit --no-wait` + your own polling | | ||
| | Job times out | Use `databricks jobs submit --no-wait` and poll `get-run` yourself, or set `tasks[].timeout_seconds` in the submit JSON to extend the per-task limit | |
There was a problem hiding this comment.
databricks api get … --json '{…}' for status polling is fishy. In references/3-interactive-cluster.md §3, the polling loop sends a JSON body on an HTTP GET to /api/1.2/commands/status.
That endpoint takes clusterId/contextId/commandId as query string parameters, and HTTP GET bodies are usually dropped. The replacement should be:
databricks api get "/api/1.2/commands/status?clusterId=$CID&contextId=$CTX&commandId=$CMD
There was a problem hiding this comment.
(Claude here.)
Confirmed your read and fixed in 7ebcdb0.
Looked at alternatives before patching: the modern Databricks CLI doesn't ship a command-execution subcommand (just checked v0.299.2 — no group covers the 1.2 endpoints), and the modern SDKs route the same way under the hood. So databricks api get with query-string params is the only correct surface for this legacy API. Done as you suggested:
databricks api get "/api/1.2/commands/status?clusterId=${CID}&contextId=${CTX}&commandId=${CMD}"Added a one-liner above the snippet noting why the GET body has to go in the URL.
# Conflicts: # manifest.json
That PyPI package is the legacy CLI; the modern CLI is a binary. Per @lennartkats-db review on #90, point readers at the `databricks-core` skill (which has install + auth references) instead. Two hits fixed: - experimental/databricks-execution-compute/SKILL.md intro - skills/databricks-pipelines/references/workflows.md troubleshooting table (carried over from the a-d-k port in #85) Co-authored-by: Isaac
Per @lennartkats-db review: that endpoint takes clusterId/contextId/ commandId as query string parameters, and HTTP GET bodies are typically dropped by intermediaries. The legacy 1.2 execution-context API is the only surface here (no native CLI subcommand exists), so the fix is to move the params into the URL. Co-authored-by: Isaac
… from a-d-k#534 (#97) ## Summary Mirrors the four fixes shipping in [databricks-solutions/ai-dev-kit#534](databricks-solutions/ai-dev-kit#534) to the same files under `experimental/`. Each of these files was carried over verbatim from a-d-k and still had the bugs that PR fixes. ## Files | File | Fix | |------|-----| | `experimental/databricks-apps-python/examples/fm-parallel-calls.py` | guard div-by-zero on speedup calc when `total_time` is below clock resolution; convert trailing module-level docstring (unreachable as docs) into a real comment block | | `experimental/databricks-apps-python/examples/llm_config.py` | stop echoing the raw token-endpoint response and HTTP body in error messages — they can contain credentials. Surface the shape (payload keys / status code) instead | | `experimental/databricks-python-sdk/examples/5-serving-and-vector-search.py` | replace syntactically-broken example `query_vector=[0.1, 0.2, 0.3, ...]` (literal Ellipsis tuple) with a real `list[float]` stand-in + a comment about matching the index's embedding dimension | | `experimental/databricks-apps-python/examples/fm-minimal-chat.py` | fix `streamlit run 2-minimal-chat-app.py` / `command: ["streamlit", "run", "2-minimal-chat-app.py"]` in the docstring — the actual filename is `fm-minimal-chat.py`. Skill-name reference left as `databricks-model-serving` to match DAS naming | The `compute.py` fix from #534 doesn't apply here — that file was removed in #90. ## Test plan - [x] `python3 scripts/skills.py validate` clean - [ ] CI green This pull request and its description were written by Claude.
|
|
||
| ```bash | ||
| python <SKILL_ROOT>/scripts/compute.py list-compute --resource clusters | ||
| databricks clusters list --output json | jq '.[] | select(.state == "RUNNING") | {cluster_id, cluster_name, state, cluster_source}' |
There was a problem hiding this comment.
I believe we need to filter (server side) by default on
--cluster-sources UI,API
Otherwise it's super slow in our env so we want to have this default in the example
There was a problem hiding this comment.
Done in 377b1c1. Also applied to line 163 which had the same issue — line 36 already had the filter so just bringing the other two in line with that convention.
|
@lennartkats-db — (Claude on James's behalf.) Both your review comments are addressed:
Replies on both threads. Quentin approved separately. When you have a minute, re-review so this can land. |
…,API Per @QuentinAmbard: on busy workspaces the cluster listing is dominated by job clusters and the unfiltered API call is slow. Apply the server-side filter to the "first action" snippet at line 24 and the secondary reference at line 163, matching the convention the file already uses at line 36. This PR was prepared by Claude.
Summary
Per Lennart's audit on #73, item #7: "for getting this out of experimental I think we need to just rely on the Databricks CLI. And if that somehow has gaps then we should fix them!"
Doing this now rather than waiting for stable promotion — the CLI already covers everything
scripts/compute.pywrapped, just verbosely for the cluster-code-execution path.Removed
experimental/databricks-execution-compute/scripts/compute.py(743 lines, three subcommands).Replacement mapping
compute.pysubcommandlist-compute --resource clustersdatabricks clusters list --output jsonlist-compute --resource node-typesdatabricks clusters list-node-typeslist-compute --resource spark-versionsdatabricks clusters spark-versionslist-compute --auto-selectdatabricks clusters list --cluster-sources UI,API --output json | jq '.[] | select(.state=="RUNNING")'manage-cluster --action create/start/restart/terminate/deletedatabricks clusters create/start/restart/delete/permanent-deleteexecute-code --compute-type serverless(upload+submit+wait+cleanup)databricks workspace import+databricks jobs submit+databricks jobs get-run+databricks jobs get-run-outputflow that SKILL.md already documents. The "Convenience wrapper" section pointing atcompute.pyis removed.execute-code --compute-type cluster(code on a running cluster with persisted context)databricks api post /api/1.2/contexts/create→/api/1.2/commands/execute→ poll/api/1.2/commands/status→ optional/api/1.2/contexts/destroy. Full recipe (create context, submit, poll, fetch results, follow-up commands reusingcontextId,%pip install, language switching) added toreferences/3-interactive-cluster.md.Other changes
<SKILL_ROOT>path-convention note from SKILL.md (was only there to referencescripts/compute.pypaths).tasks[].timeout_secondsfield on submit JSON instead of--timeouton the wrapper.Test plan
python3 scripts/skills.py generateclean.python3 scripts/skills.py validatepasses.references/3-interactive-cluster.md.This pull request and its description were written by Claude.