Skip to content

skills(execution-compute): drop bundled compute.py, document the CLI surface#90

Open
jamesbroadhead wants to merge 5 commits into
mainfrom
jb/drop-compute-py
Open

skills(execution-compute): drop bundled compute.py, document the CLI surface#90
jamesbroadhead wants to merge 5 commits into
mainfrom
jb/drop-compute-py

Conversation

@jamesbroadhead
Copy link
Copy Markdown
Contributor

Summary

Per Lennart's audit on #73, item #7: "for getting this out of experimental I think we need to just rely on the Databricks CLI. And if that somehow has gaps then we should fix them!"

Doing this now rather than waiting for stable promotion — the CLI already covers everything scripts/compute.py wrapped, just verbosely for the cluster-code-execution path.

Removed

experimental/databricks-execution-compute/scripts/compute.py (743 lines, three subcommands).

Replacement mapping

compute.py subcommand CLI replacement
list-compute --resource clusters databricks clusters list --output json
list-compute --resource node-types databricks clusters list-node-types
list-compute --resource spark-versions databricks clusters spark-versions
list-compute --auto-select databricks clusters list --cluster-sources UI,API --output json | jq '.[] | select(.state=="RUNNING")'
manage-cluster --action create/start/restart/terminate/delete databricks clusters create/start/restart/delete/permanent-delete
execute-code --compute-type serverless (upload+submit+wait+cleanup) The existing databricks workspace import + databricks jobs submit + databricks jobs get-run + databricks jobs get-run-output flow that SKILL.md already documents. The "Convenience wrapper" section pointing at compute.py is removed.
execute-code --compute-type cluster (code on a running cluster with persisted context) Multi-step databricks api post /api/1.2/contexts/create/api/1.2/commands/execute → poll /api/1.2/commands/status → optional /api/1.2/contexts/destroy. Full recipe (create context, submit, poll, fetch results, follow-up commands reusing contextId, %pip install, language switching) added to references/3-interactive-cluster.md.

Other changes

  • Removed the <SKILL_ROOT> path-convention note from SKILL.md (was only there to reference scripts/compute.py paths).
  • "CLI Commands" table replaced with a more useful "CLI Command Map" covering upload, submit, run state, run output, cluster lifecycle, code-execution-on-cluster, and warehouses.
  • Common-issues table for serverless updated to reference the tasks[].timeout_seconds field on submit JSON instead of --timeout on the wrapper.

Test plan

  • python3 scripts/skills.py generate clean.
  • python3 scripts/skills.py validate passes.
  • CI green.
  • Reviewer sanity-check of the cluster-execution recipe in references/3-interactive-cluster.md.

This pull request and its description were written by Claude.

…surface

Per Lennart's audit on #73 (item #7): "for getting this out of
experimental I think we need to just rely on the Databricks CLI."
Going ahead and doing this now rather than waiting for stable
promotion — the CLI already covers everything compute.py wrapped, just
verbosely for the cluster-code-execution path.

Removed: experimental/databricks-execution-compute/scripts/compute.py
(743 lines, three subcommands).

Replacement mapping:

- `compute.py list-compute` (clusters/node-types/spark-versions)
  → `databricks clusters list/list-node-types/spark-versions`
- `compute.py manage-cluster --action ...`
  → `databricks clusters create/start/restart/delete/permanent-delete/resize`
- `compute.py execute-code --compute-type serverless` (upload+submit+
  wait+cleanup)
  → existing `databricks workspace import` + `databricks jobs submit
  [--no-wait]` + `databricks jobs get-run` + `databricks jobs
  get-run-output` flow (already documented in SKILL.md / 2-serverless-job.md;
  the "Convenience wrapper" section that pointed at compute.py is
  removed).
- `compute.py execute-code --compute-type cluster` (code execution on a
  running cluster with persisted context)
  → multi-step `databricks api post /api/1.2/contexts/create` →
  `/api/1.2/commands/execute` → poll `/api/1.2/commands/status` →
  optional `/api/1.2/contexts/destroy`. The full recipe (create
  context, submit, poll, fetch results, follow-up commands reusing
  contextId, %pip install, language switching) is in
  references/3-interactive-cluster.md.

Side effects:
- Removed the `<SKILL_ROOT>` path-convention note from SKILL.md (was
  only there to reference scripts/compute.py paths).
- Added a "CLI Command Map" table to SKILL.md replacing the
  three-row compute.py table — now covers upload/submit/run-state/
  output, cluster list/get/start/stop/create, code-execution-on-cluster,
  and warehouses.

Manifest regenerated. `python3 scripts/skills.py validate` passes.

Co-authored-by: Isaac
Copy link
Copy Markdown
Collaborator

@lennartkats-db lennartkats-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments

Run code on Databricks. Three execution modes—choose based on workload.

> **Path convention:** `<SKILL_ROOT>` in examples below = the directory containing this SKILL.md. Resolve it to the absolute path in your install (e.g. `~/.claude/skills/databricks-execution-compute`). Commands like `python <SKILL_ROOT>/scripts/compute.py ...` work from any cwd.
Run code on Databricks. Three execution modes—choose based on workload. All examples below use the Databricks CLI; install with `pip install databricks-cli` (or follow the workspace-native `databricks` quickstart) and authenticate with `databricks auth login` (see the parent `databricks-core` skill for profile setup).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This installation instruction is for the legacy CLI; we don't use pip anymore

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Claude here, on James's behalf.)

Fixed in 3d4d88b — dropped the pip install databricks-cli reference and now point readers at the databricks-core skill for install + auth.

Also did a wider grep — found one more carryover in skills/databricks-pipelines/references/workflows.md (came in from the a-d-k port in #85, originally databricks-skills/databricks-spark-declarative-pipelines/references/1-project-initialization.md:418 upstream). Fixed in the same commit. There's also a stale error message in ai-dev-kit's databricks-mcp-server/databricks_mcp_server/tools/workspace.py:211 recommending the same legacy package — that one's out of scope here but worth flagging upstream.

| Dependencies listed but not installed | `"client": "1"` silently drops `dependencies`; use `"client": "4"` |
| `get-run-output` returns empty `notebook_output` | You passed the parent run_id, not `.tasks[0].run_id` |
| Job times out | Default 1800 s on the script wrapper; raise `--timeout` or use `jobs submit --no-wait` + your own polling |
| Job times out | Use `databricks jobs submit --no-wait` and poll `get-run` yourself, or set `tasks[].timeout_seconds` in the submit JSON to extend the per-task limit |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

databricks api get … --json '{…}' for status polling is fishy. In references/3-interactive-cluster.md §3, the polling loop sends a JSON body on an HTTP GET to /api/1.2/commands/status.
  That endpoint takes clusterId/contextId/commandId as query string parameters, and HTTP GET bodies are usually dropped. The replacement should be:
  databricks api get "/api/1.2/commands/status?clusterId=$CID&contextId=$CTX&commandId=$CMD

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Claude here.)

Confirmed your read and fixed in 7ebcdb0.

Looked at alternatives before patching: the modern Databricks CLI doesn't ship a command-execution subcommand (just checked v0.299.2 — no group covers the 1.2 endpoints), and the modern SDKs route the same way under the hood. So databricks api get with query-string params is the only correct surface for this legacy API. Done as you suggested:

databricks api get "/api/1.2/commands/status?clusterId=${CID}&contextId=${CTX}&commandId=${CMD}"

Added a one-liner above the snippet noting why the GET body has to go in the URL.

That PyPI package is the legacy CLI; the modern CLI is a binary. Per
@lennartkats-db review on #90, point readers at the `databricks-core`
skill (which has install + auth references) instead.

Two hits fixed:
- experimental/databricks-execution-compute/SKILL.md intro
- skills/databricks-pipelines/references/workflows.md troubleshooting
  table (carried over from the a-d-k port in #85)

Co-authored-by: Isaac
Per @lennartkats-db review: that endpoint takes clusterId/contextId/
commandId as query string parameters, and HTTP GET bodies are typically
dropped by intermediaries. The legacy 1.2 execution-context API is the
only surface here (no native CLI subcommand exists), so the fix is to
move the params into the URL.

Co-authored-by: Isaac
simonfaltum pushed a commit that referenced this pull request May 27, 2026
… from a-d-k#534 (#97)

## Summary

Mirrors the four fixes shipping in
[databricks-solutions/ai-dev-kit#534](databricks-solutions/ai-dev-kit#534)
to the same files under `experimental/`. Each of these files was carried
over verbatim from a-d-k and still had the bugs that PR fixes.

## Files

| File | Fix |
|------|-----|
| `experimental/databricks-apps-python/examples/fm-parallel-calls.py` |
guard div-by-zero on speedup calc when `total_time` is below clock
resolution; convert trailing module-level docstring (unreachable as
docs) into a real comment block |
| `experimental/databricks-apps-python/examples/llm_config.py` | stop
echoing the raw token-endpoint response and HTTP body in error messages
— they can contain credentials. Surface the shape (payload keys / status
code) instead |
|
`experimental/databricks-python-sdk/examples/5-serving-and-vector-search.py`
| replace syntactically-broken example `query_vector=[0.1, 0.2, 0.3,
...]` (literal Ellipsis tuple) with a real `list[float]` stand-in + a
comment about matching the index's embedding dimension |
| `experimental/databricks-apps-python/examples/fm-minimal-chat.py` |
fix `streamlit run 2-minimal-chat-app.py` / `command: ["streamlit",
"run", "2-minimal-chat-app.py"]` in the docstring — the actual filename
is `fm-minimal-chat.py`. Skill-name reference left as
`databricks-model-serving` to match DAS naming |

The `compute.py` fix from #534 doesn't apply here — that file was
removed in #90.

## Test plan

- [x] `python3 scripts/skills.py validate` clean
- [ ] CI green

This pull request and its description were written by Claude.

```bash
python <SKILL_ROOT>/scripts/compute.py list-compute --resource clusters
databricks clusters list --output json | jq '.[] | select(.state == "RUNNING") | {cluster_id, cluster_name, state, cluster_source}'
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we need to filter (server side) by default on
--cluster-sources UI,API
Otherwise it's super slow in our env so we want to have this default in the example

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 377b1c1. Also applied to line 163 which had the same issue — line 36 already had the filter so just bringing the other two in line with that convention.

@jamesbroadhead
Copy link
Copy Markdown
Contributor Author

@lennartkats-db — (Claude on James's behalf.) Both your review comments are addressed:

  • pip install databricks-cli → dropped; points at the databricks-core skill for install + auth (commit 3d4d88b).
  • GET-with-body on /api/1.2/commands/status → rewritten to use query-string params; the modern CLI has no command-execution subcommand so this is the canonical surface (commit 7ebcdb0).

Replies on both threads. Quentin approved separately. When you have a minute, re-review so this can land.

…,API

Per @QuentinAmbard: on busy workspaces the cluster listing is
dominated by job clusters and the unfiltered API call is slow.
Apply the server-side filter to the "first action" snippet at
line 24 and the secondary reference at line 163, matching the
convention the file already uses at line 36.

This PR was prepared by Claude.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants