Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 28 additions & 9 deletions manifest.json
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
{
"version": "2",
"updated_at": "2026-04-30T11:02:41Z",
"updated_at": "2026-05-04T13:00:55Z",
"skills": {
"databricks-apps": {
"version": "0.1.1",
"description": "Databricks Apps development and deployment (evaluates analytics vs synced tables data access)",
"experimental": false,
"updated_at": "2026-04-30T11:00:26Z",
"updated_at": "2026-04-30T11:19:36Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -33,7 +33,7 @@
"version": "0.1.0",
"description": "Core Databricks skill for CLI, auth, and data exploration",
"experimental": false,
"updated_at": "2026-04-23T13:47:44Z",
"updated_at": "2026-05-04T12:38:42Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -48,7 +48,7 @@
"version": "0.0.0",
"description": "Declarative Automation Bundles (DABs) for deploying and managing Databricks resources",
"experimental": false,
"updated_at": "2026-04-23T13:47:44Z",
"updated_at": "2026-04-30T11:19:36Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -66,7 +66,7 @@
"version": "0.1.0",
"description": "Databricks Jobs orchestration and scheduling",
"experimental": false,
"updated_at": "2026-04-23T13:47:44Z",
"updated_at": "2026-04-30T11:19:36Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -78,7 +78,7 @@
"version": "0.1.0",
"description": "Databricks Lakebase Postgres: projects, scaling, connectivity, synced tables, and Data API",
"experimental": false,
"updated_at": "2026-04-30T11:02:37Z",
"updated_at": "2026-04-30T11:19:36Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -93,7 +93,7 @@
"version": "0.1.0",
"description": "Databricks Model Serving endpoint management",
"experimental": false,
"updated_at": "2026-04-23T13:47:44Z",
"updated_at": "2026-04-30T11:19:36Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -105,7 +105,7 @@
"version": "0.1.0",
"description": "Databricks Pipelines (DLT) for ETL and streaming",
"experimental": false,
"updated_at": "2026-04-23T13:47:44Z",
"updated_at": "2026-04-30T11:19:36Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand Down Expand Up @@ -152,7 +152,7 @@
"version": "0.1.0",
"description": "Migrate Databricks workloads from classic compute to serverless compute, including compatibility checks and concrete fixes",
"experimental": false,
"updated_at": "2026-04-24T15:10:23Z",
"updated_at": "2026-04-30T11:19:36Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
Expand All @@ -164,6 +164,25 @@
"references/networking-and-security.md",
"references/streaming-migration.md"
]
},
"databricks-unitycatalog": {
"version": "0.1.0",
"description": "Unity Catalog governance: discovery, grants, volumes, external locations, lineage, and UC-managed objects",
"experimental": false,
"updated_at": "2026-05-04T12:42:21Z",
"files": [
"SKILL.md",
"agents/openai.yaml",
"assets/databricks.png",
"assets/databricks.svg",
"references/access-control.md",
"references/ai-ml-objects.md",
"references/lineage-and-observability.md",
"references/namespace-and-objects.md",
"references/operations-and-migration.md",
"references/storage-and-connections.md",
"references/volumes.md"
]
}
}
}
49 changes: 49 additions & 0 deletions skills/databricks-unitycatalog/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
---
name: databricks-unitycatalog
description: "Unity Catalog governance operations: discovery, grants, volumes, external locations, and UC object workflows."
compatibility: Requires databricks CLI (>= v0.292.0)
metadata:
version: "0.1.0"
parent: databricks-core
---

# Databricks Unity Catalog

**FIRST**: Use the parent `databricks-core` skill for CLI basics, authentication, and profile selection.

Use this skill for Unity Catalog governance and day-2 operations: namespaces and objects, discovery, grants/privileges, volumes, external locations, storage credentials, lineage/observability, and UC-managed AI/ML objects.

## Required Reading by Task

| Task | READ BEFORE proceeding |
|------|------------------------|
| Discover catalogs/schemas/tables; search metadata | [Namespace & discovery](references/namespace-and-objects.md) |
| Grants, privileges, ownership/MANAGE, RLS/CLS | [Access control](references/access-control.md) |
| Read/write files via Volumes | [Volumes](references/volumes.md) |
| External locations, storage credentials, federation, sharing | [Storage & connections](references/storage-and-connections.md) |
| Lineage, tags, audit logs, cost attribution | [Lineage & observability](references/lineage-and-observability.md) |
| Maintenance, time travel, migration, constraints, clone | [Operations & migration](references/operations-and-migration.md) |
| Models, functions, vector search, feature tables | [AI & ML objects](references/ai-ml-objects.md) |

## Priorities (P1 → P3)

- **P1**: Access control (grants/privileges), volumes + external locations, and metadata discovery (`information_schema`)
- **P2**: Lineage/observability (tags, audit logs), federation/sharing patterns, and operational best practices
- **P3**: Billing and cost attribution patterns (system tables)

## Key gotchas (do not skip)

- **CLI args**: many UC list/get commands use **positional** arguments (see parent `databricks-core` quick reference).
- **File privileges**: **`WRITE FILES` requires `READ FILES`** (common cause of confusing permission errors).
- **Discovery without data**: `BROWSE` enables seeing objects without reading table data.
- **Ownership vs MANAGE**: these are not interchangeable; confirm which is required for the operation.

## Reference Guides

- [Namespace & discovery](references/namespace-and-objects.md)
- [Access control](references/access-control.md)
- [Volumes](references/volumes.md)
- [Storage & connections](references/storage-and-connections.md)
- [Lineage & observability](references/lineage-and-observability.md)
- [Operations & migration](references/operations-and-migration.md)
- [AI & ML objects](references/ai-ml-objects.md)
7 changes: 7 additions & 0 deletions skills/databricks-unitycatalog/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
interface:
display_name: "Databricks Unity Catalog"
short_description: "UC governance: grants, volumes, external locations"
icon_small: "./assets/databricks.svg"
icon_large: "./assets/databricks.png"
brand_color: "#FF3621"
default_prompt: "Use $databricks-unitycatalog for Unity Catalog governance tasks (grants, volumes, external locations, discovery)."
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions skills/databricks-unitycatalog/assets/databricks.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
75 changes: 75 additions & 0 deletions skills/databricks-unitycatalog/references/access-control.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
# Access control (grants, privileges, RLS/CLS)

## When to use this reference

Use this doc for:

- Grant/revoke workflows (`GRANT`, `REVOKE`, `SHOW GRANTS`)
- “I can’t see the table” vs “I can’t query the table” debugging
- Permissions on volumes / external locations (file privileges)
- Row and column-level security (row filters, column masks)

## Core concepts (keep straight)

- **Privileges** are granted on UC securables (catalogs, schemas, tables/views, volumes, external locations, functions, etc.).
- **Discovery** can be separated from data access via `BROWSE`.
- **Namespace traversal** often requires `USE CATALOG` + `USE SCHEMA` even when `SELECT` exists.
- **Ownership** is not the same as `MANAGE` (workspaces differ on what each enables).
- **File privileges gotcha**: **`WRITE FILES` requires `READ FILES`**.

## Quick checklist: “why can’t user X query object Y?”

1. Confirm the user/principal identity (`<principal>`).
2. Check grants on:
- catalog + schema (traversal / discovery)
- the target object (table/view/volume/external location)
3. If the error mentions files/paths, verify file privileges (`READ FILES`, `WRITE FILES`) and underlying external location grants.
4. If the query returns fewer rows or masked values, check row filters / column masks.

## Common SQL patterns

```sql
-- Inspect grants (examples)
SHOW GRANTS ON CATALOG <catalog>;
SHOW GRANTS ON SCHEMA <catalog>.<schema>;
SHOW GRANTS ON TABLE <catalog>.<schema>.<table>;
SHOW GRANTS ON VIEW <catalog>.<schema>.<view>;
SHOW GRANTS ON VOLUME <catalog>.<schema>.<volume>;

-- Minimal traversal + discovery (lets users find objects)
GRANT USE CATALOG ON CATALOG <catalog> TO `<principal>`;
GRANT USE SCHEMA ON SCHEMA <catalog>.<schema> TO `<principal>`;
GRANT BROWSE ON CATALOG <catalog> TO `<principal>`;

-- Data access
GRANT SELECT ON TABLE <catalog>.<schema>.<table> TO `<principal>`;

-- Revoke
REVOKE SELECT ON TABLE <catalog>.<schema>.<table> FROM `<principal>`;
```

### Troubleshooting: “not found” vs “permission denied”

- **“Not found” / can’t list** often means missing `USE CATALOG` / `USE SCHEMA` and/or `BROWSE`.
- **“Permission denied” on query** usually means missing `SELECT`, or a denied row/column policy, or file privileges on underlying storage paths.

## `ALL PRIVILEGES` notes

Treat `ALL PRIVILEGES` as a convenience that depends on object type and platform semantics. Prefer granting only what is required and verifying with `SHOW GRANTS`.

## Ownership vs `MANAGE`

Document which operations require ownership vs `MANAGE` in your environment. Do not assume one implies the other.

## RLS/CLS: row filters + column masks

Unity Catalog can enforce:

- **Row filters**: restrict which rows a principal can see
- **Column masks**: redact/transform specific columns

Debug workflow:

- Start with a minimal query selecting non-sensitive columns
- If results differ by principal, inspect applicable row/column policies
- Confirm base privileges first (`USE CATALOG`, `USE SCHEMA`, `SELECT`)
62 changes: 62 additions & 0 deletions skills/databricks-unitycatalog/references/ai-ml-objects.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# AI & ML objects in Unity Catalog (models, functions, vector, features)

## When to use this reference

Use this doc when working with UC-governed AI/ML primitives:

- registered models
- UC functions (including those used as governed “tools”)
- vector search indexes
- feature tables and online store publishing (if applicable)

## Registered models (governance mindset)

UC can govern registered models and their lifecycle (versions, aliases/stages depending on setup). Treat model governance similarly to table governance:

- who can read / write / deploy
- how changes are audited
- how environments (dev/stage/prod) are separated

## UC functions as governed tools

UC functions can be a controlled “tool surface” when used intentionally.

Checklist:

- Add a clear `COMMENT` describing safe usage and inputs/outputs.
- Ensure callers have `EXECUTE` privilege (and only what they need).
- Avoid designs that require embedding secrets in function bodies or configs.

## Python UDFs / UDTFs (validate constraints early)

Support, packaging, and runtime constraints vary by environment. Validate:

- runtime compatibility
- dependency strategy (what can/can’t be packaged)
- permissions (who can create/alter/execute)

## Vector Search indexes

Common patterns:

- direct index over data
- Delta Sync-managed refresh

Pick based on freshness requirements and operational overhead.

## Feature tables / online store publishing

Typical workflow:

- curate feature tables with stable keys and definitions
- publish/sync to an online store (if used)

Confirm which feature APIs your workspace supports and which principal will run publish/sync jobs (human vs service principal).

## External access from functions/UDFs

If functions/UDFs access external cloud services:

- keep credentials out of code (no embedded tokens/secrets)
- confirm egress/networking policies allow access
- enforce least privilege and auditability
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Lineage & observability (metadata, tags, audit, billing)

## When to use this reference

Use this doc when you need to:

- Verify lineage exists for a table/model/dashboard/pipeline
- Bring lineage from external systems (or document gaps)
- Apply or audit tags (system vs governed)
- Investigate access and permission changes via audit logs
- Attribute costs using system billing tables

## Automated lineage (how to reason about it)

Unity Catalog can capture lineage across common compute and platform surfaces (tables, pipelines, dashboards, models). Coverage varies by feature/integration.

Checklist:

- Validate lineage on a representative object first (don’t assume global coverage).
- If lineage is missing, determine whether it’s a tooling gap, a permissions gap, or an unsupported integration path.

## External lineage (BYO)

For systems outside Databricks (BI tools, SaaS sources, external warehouses), use external lineage ingestion where available. If not possible, document:

- what lineage will remain missing
- what identifiers can be used to correlate (table names, URLs, workbook IDs, etc.)

## Tags (system vs governed)

- **System tags**: platform-generated metadata.
- **Governed tags**: curated taxonomy with controlled assignment.

When using governed tags, principals may require privileges such as:

- `APPLY TAG`
- an assignment permission (often called `ASSIGN`) depending on the governed-tag system in use

## Audit logs (`system.access.audit`)

Use audit logs to answer “who did what, when” and to diagnose unexpected permission/access patterns.

```sql
-- Recent grant/revoke-related actions
SELECT *
FROM system.access.audit
WHERE event_time >= current_timestamp() - INTERVAL 7 DAYS
AND (
lower(action_name) LIKE '%grant%'
OR lower(action_name) LIKE '%revoke%'
)
ORDER BY event_time DESC
LIMIT 200;
```

## Billing / cost attribution (`system.billing.usage`)

Use usage tables for cost attribution by workspace, identity, SKU, and time range.

```sql
SELECT *
FROM system.billing.usage
WHERE usage_start_time >= current_timestamp() - INTERVAL 30 DAYS
ORDER BY usage_start_time DESC
LIMIT 200;
```
Loading