Skip to content

feat(llm): add native Databricks Unity AI Gateway provider#3286

Draft
prasadkona wants to merge 23 commits into
OpenHands:mainfrom
prasadkona:feat/databricks-native-provider
Draft

feat(llm): add native Databricks Unity AI Gateway provider#3286
prasadkona wants to merge 23 commits into
OpenHands:mainfrom
prasadkona:feat/databricks-native-provider

Conversation

@prasadkona

@prasadkona prasadkona commented May 17, 2026

Copy link
Copy Markdown

Summary

Adds a native, optimized provider (DatabricksLLM) for the Databricks Unity AI
Gateway
, giving OpenHands agents governed access to the foundation models served
through a customer's Databricks workspace. The connector is Databricks PWAF
(Partner Well-Architected Framework) compliant
.

This is the foundational PR. The companion CLI PR (OpenHands/OpenHands-CLI#740)
and web-app PR (OpenHands/OpenHands#14449) depend on this being merged first.


Motivation

Databricks customers access foundation models through the Unity AI Gateway,
which provides a single governed entry point — unified authentication, access
control, usage tracking, and policy enforcement — across multiple model families.

This connector lets OpenHands talk to that gateway natively, with full control over
the connection lifecycle, retry/backoff against the gateway's error contract, and
the distinct native API surfaces that different model families expose (OpenAI Chat,
Anthropic Messages, Gemini generateContent, OpenAI Responses). The result is an
optimized, governance-aware path to Databricks-served foundation models.


What's new

New package: openhands/sdk/llm/providers/databricks/

Module Role
llm.py DatabricksLLM — Pydantic subclass of LLM
client.py Unity AI Gateway transport, family dispatch, retry/backoff
native.py Per-family request/response adapters (OpenAI Chat, Anthropic, Gemini, Responses)
models.py ProviderFamily, gateway path routing, StoredU2MTokens
auth.py Credential strategies, resolve_credentials(), token providers
discovery.py list_chat_endpoints / list_foundation_models with TTL cache
utils.py Timeouts, retry/backoff helpers, error mapping

Unity AI Gateway model families

Family When used
OPENAI Default — all llm/v1/chat endpoints (Llama, GPT-OSS, …)
OPENAI_RESPONSES GPT-5 series (databricks-gpt-5*)
ANTHROPIC Claude models (*claude*)
GEMINI Gemini models (*gemini*)

Routing is name-pattern by default (no extra API call). An opt-in
databricks_metadata_probe=True mode performs an authoritative serving-endpoint
lookup (5-minute TTL) for external-model endpoints.

Auth strategies (all governed through the workspace)

  1. U2M — OAuth browser PKCE (tokens passed in from the app layer)
  2. M2M — OAuth client credentials / service principal
  3. PAT — Personal Access Token
  4. PROFILE~/.databrickscfg (requires databricks-sdk optional dep)
  5. UNIFIEDdatabricks-sdk auth chain (workload identity, Azure AD, …)

Changes to existing files (minimal wiring only)

  • sdk/__init__.py: routes databricks/ model IDs to the native provider and
    everything else to the base LLM.
  • sdk/llm/llm.py: adds a model validator so that serialized agents containing
    "provider": "databricks" rehydrate as DatabricksLLM. Purely additive — no
    existing logic is changed.

Tests

  • 9 new test files under tests/sdk/llm/providers/databricks/279 unit tests
  • Live integration tested against a real Databricks workspace across all 4 Unity AI
    Gateway model families and all 5 auth strategies

Test plan

  • uv run pytest tests/sdk/llm/providers/databricks/ -q — 279 passed
  • A databricks/... model ID resolves to the native Databricks provider
  • A non-Databricks model ID still resolves to the base LLM

Alignment with Databricks ucode

This integration follows the same credential model as
Databricks ucode — the Unity AI Gateway
Coding CLI
that launches coding agents through the Databricks AI Gateway using
workspace credentials, no API keys required. The connector's PROFILE and
UNIFIED strategies read the workspace login a developer has already established
(databricks auth login / ~/.databrickscfg), and U2M provides interactive
browser OAuth — so an OpenHands agent can reach AI Gateway the same key-free,
governed way ucode does, reusing the existing workspace session rather than a
separate token. The result is one consistent, governed path to AI Gateway (and
the Unity Catalog–governed resources behind it) across ucode and OpenHands.

authoritative reference for credential handling.

See the companion skill ``databricks-ai-gateway-fm-apis`` (in ``_local/skills``)
for the routing table, worked examples, and a runnable ``probe.py`` that

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this exists? What is the referenced skill?

"model": ("llm_model",),
"api_key": ("llm_api_key",),
"base_url": ("llm_base_url",),
# OpenHands web app stores the Databricks workspace URL in llm_base_url.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to not rely on what the web app is doing; first, we are just reworking it, and second, from the point of view of architectural thinking, in the sdk we need to support all client applications, not rely on what one client app does.

@enyst enyst left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution, @prasadkona, I find it very interesting that your description points out the four major LLM APIs. Recently I’ve rewritten from scratch LLM API clients elsewhere (not in this repo), and I found that indeed working separately with them may be a more flexible and less error prone approach than litellm’s attempt to convert anything to openai-compatible. But that’s just me.

In OpenHands, we’ve had litellm since forever, and I’m not sure we are ready to add another generic provider. Just as a heads up, this is a complicated proposal.

I realize this is a draft, no hurry, just for the record, I would love to know more about the differences and why would Databricks be added. Thank you for the work on it!

prasadkona added 15 commits May 24, 2026 16:33
Adds DatabricksLLM — a native provider for the Databricks AI Gateway that
bypasses LiteLLM and routes directly to the correct per-family endpoint:

- Anthropic Claude  → /anthropic/v1/messages
- Google Gemini     → /gemini/v1/generateContent
- OpenAI GPT-5+     → /openai/v1/responses  (gpt-\d routing rule)
- All others        → /mlflow/v1/chat/completions

Auth: PAT, M2M (service-principal), CLI profile, and U2M (browser SSO via
databricks-sdk). All auth strategies resolve credentials lazily so saving
settings succeeds before the optional databricks-sdk package is installed.

Base class changes are minimal and PR-friendly:
- `LLM`: slim 15-line dispatch validator (generic subclass discovery, no
  hardcoded names); no new fields on the base class
- `AgentBase.llm` + `LLMSummarizingCondenser.llm`: `SerializeAsAny`
  annotation so DatabricksLLM fields survive agent save/load round-trips
- `model_features.py`: early-return guard for `databricks/` prefix
- `__init__.py`: additive `create_llm` factory

Includes 275 unit tests covering auth, client, discovery, routing,
native API translation (multi-turn tool calls, Responses API format),
resilience, and settings bridge.
…ator

The safety_settings field was removed from LLM. The @field_validator
for it needs check_fields=False to avoid a Pydantic startup error when
the field no longer exists in the model.
…ate kwarg

When DatabricksLLM is constructed with stream=True the base LLM.completion()
passes stream=True through **kwargs in addition to enable_streaming. Pop it
before forwarding to DatabricksFMAPIClient.chat_completion() to prevent the
'multiple values for keyword argument stream' TypeError.
… forwarding

extra_headers and extra_body are litellm-specific conventions that the base
LLM class injects into call kwargs. DatabricksLLM._transport_call previously
forwarded these via **kwargs into DatabricksFMAPIClient.chat_completion(),
which serialised them as JSON body fields — causing HTTP 400 errors from the
AI Gateway (e.g. gpt-5-mini: "Unknown parameter: 'extra_headers'").

Fix: explicitly pop extra_headers, extra_body, and stream from kwargs before
forwarding to chat_completion(). stream was already popped; this commit
extends the strip list to cover the two new offenders.

Tests: two new unit tests verify the strip at both the _transport_call layer
(test_llm.py) and the client layer (test_client.py). Full Databricks suite:
278 passed.
DatabricksLLM.databricks_client_secret is a SecretStr field that was not
registered in LLM_SECRET_FIELDS, so the base _serialize_secrets field
serializer never fired for it.  On AgentStore.save() the field was written as
"**********" to agent_settings.json; on reload that masked string was sent
to the Databricks OIDC /v1/token endpoint, causing a 401 on every M2M session
restart.

Fix: add a dedicated @field_serializer("databricks_client_secret") on
DatabricksLLM that delegates to serialize_secret() and converts any returned
SecretStr to the REDACTED_SECRET_VALUE string (avoiding Pydantic warnings).
When AgentStore.save() passes context={"expose_secrets": True} the plaintext
value is written correctly and round-trips through model_validate_json().

Adds test_m2m_client_secret_serialized_as_plaintext_with_expose_secrets to
cover the redact / plaintext / round-trip paths.
str.replace('Bearer ', '') replaces ALL occurrences — safe in practice since
tokens never contain that string, but split(' ', 1)[1] is more idiomatic and
defensive. Applies to both PROFILE and UNIFIED auth strategy get_token() closures.
- Eagerly import DatabricksLLM in sdk/llm/__init__.py so it registers
  with LLM.__subclasses__() at module-load time. This allows the agent
  server's _dispatch_to_provider_subclass validator to reconstruct a
  DatabricksLLM from serialized JSON (provider="databricks") without
  requiring an explicit import in the agent server process.

- Add public close() method to DatabricksLLM to avoid reaching into the
  private _db_client attribute from callers.

- Standardize USER_AGENT to "OpenHandsOSS/<version>" in utils.py.

- Add databricks_host alias in UserInfoAliases (settings_bridge.py) so
  llm_base_url from user settings correctly populates databricks_host
  when constructing DatabricksLLM kwargs.

- Add context/skills compatibility shims (__init__.py, skill.py,
  utils.py) re-exporting Skill-related symbols that moved within the
  SDK, preventing ImportError in the agent server subprocess.
test_user_agent_format previously asserted startswith("openhands_oss/")
which broke when the constant was renamed to "OpenHandsOSS/<version>".
Update assertion to match the new canonical product name.

Also update the discovery test docstring that referenced the old prefix.
…elds to DatabricksLLM

These fields store the custom OAuth app credentials used in the U2M PKCE browser
flow. Previously they only existed in the CLI's SettingsFormData and were lost after
the first PKCE sign-in because kwargs_from_settings only extracts _BRIDGE_FIELDS.

Adding them to DatabricksLLM allows them to:
- Survive round-trips through model_dump_json / model_validate_json (agent settings)
- Be preserved when rebuilding the LLM after PKCE token exchange
- Be read back by the settings UI so the auth method shows as U2M on re-open
Add databricks_u2m_client_secret as a SecretStr field on DatabricksLLM
with a matching field_serializer (mirrors databricks_client_secret for
M2M). Add it to _BRIDGE_FIELDS so kwargs_from_settings passes it through
to create_llm.

Without this, the U2M client secret was never written to
agent_settings.json; every CLI restart cleared the field, causing the
PKCE token exchange to fail with 401 Unauthorized for confidential apps.
…arding

- auth.py: _resolve_u2m accepts optional client_secret and includes it in
  refresh-token requests for confidential OAuth apps. resolve_credentials
  forwards databricks_u2m_client_secret to _resolve_u2m.
- llm.py: add field_validator for databricks_u2m_client_secret that calls
  validate_secret() to coerce str→SecretStr and discard redacted placeholders.
- settings_bridge.py: add databricks_u2m_client_secret to _SECRET_FIELDS so
  it is coerced to SecretStr and never logged in plaintext.
- test_auth.py: add _make_mock_llm databricks_u2m_client_secret param;
  new tests for confidential-client refresh and resolve_credentials forwarding.
…2026 FMAPI

discovery.py — CURATED_DATABRICKS_MODELS:
- Claude: claude-sonnet-4-6 (new recommended), keep 4-5/haiku-4-5;
  add opus-4-7, opus-4-5 (current flagships); keep opus-4-1
- GPT-5: gpt-5-mini stays recommended; add gpt-5-5-pro, gpt-5-5,
  gpt-5-4, gpt-5-4-mini; keep gpt-5 and gpt-oss-120b
- Gemini: gemini-3-5-flash (new recommended); add gemini-3-flash,
  gemini-3-pro; keep gemini-2-5-flash/pro

llm.py — DATABRICKS_CONTEXT_WINDOWS / DATABRICKS_MAX_OUTPUT:
- Remove stale pre-Claude-4 entries (claude-3-5-sonnet-2,
  claude-3-7-sonnet, dbrx-instruct, mixtral-8x7b, llama-3-1-70b)
- Rename meta-llama-4-maverick → llama-4-maverick (matches FMAPI docs)
- Add full GPT-5 codex/numbered variant line (5-1 through 5-5-pro)
- Add Gemini 3 series (gemini-3-flash, 3-5-flash, 3-pro, 3-1-pro,
  3-1-flash-lite)
- Add Qwen/Gemma/Llama-3-1-8b entries
conversation_error.py:
- Add intelligent user-facing hints for Databricks-specific errors:
  - [404] AI Gateway endpoint does not exist → endpoint name / gateway
    URL mismatch guidance
  - [401] UNAUTHORIZED → token expired / wrong workspace guidance
  - [429] RATE_LIMIT_EXCEEDED → quota / retry guidance
  - [403] Invalid access to Org → cross-geography model serving note
    with recommendation to use Refresh Models and pick a supported model
- Hints are surfaced in the ConversationErrorEvent.visualize property

discovery.py:
- Remove databricks-gemini-3-flash and databricks-gemini-3-pro from
  CURATED_DATABRICKS_MODELS; these require cross-geography routing not
  available in all workspaces and cause confusing 403 errors
- Add context-window and max-output metadata for all verified models
@all-hands-bot

Copy link
Copy Markdown
Collaborator

[Automatic Post]: It has been a while since there was any activity on this PR. @prasadkona, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

This comment was created by an AI agent (OpenHands) on behalf of the user.

…rors

- Add missing required_positional_arg to ModelFeatures.__init__ to match
  updated signature (fixes TypeError on CLI startup)
- Handle PermissionError gracefully when iterating workspace directory in
  find_third_party_files (fixes crash on macOS TCC-restricted paths)
- Update Databricks provider utils for correct base_url resolution
@prasadkona prasadkona force-pushed the feat/databricks-native-provider branch from fdfe2e1 to fc0f735 Compare June 7, 2026 02:53
@prasadkona prasadkona changed the title feat(llm): add native Databricks AI Gateway provider feat(llm): add native Databricks Unity AI Gateway provider Jun 7, 2026
Restore skills modules and the safety_settings deprecation validator to
upstream main — these were fork-drift artifacts unrelated to the Databricks
provider. The connector does not depend on them.
Consolidate the Authorization Code + PKCE browser-login primitives into a
single SDK module so the OpenHands web app and CLI no longer maintain
separate copies. Provides generate_pkce, build_authorize_url, and both sync
and async code-for-token exchange helpers, exported from the databricks
provider package. Bumps SDK to 1.27.0.
Remove obsolete model entries from family-detection parametrizations and
update curated-model assertions to current Foundation Model API names.
…PKCE helpers

Bring the provider README in sync with the code: add pkce.py and
settings_bridge.py to the module-layout table, note the __init__ now exports the
PKCE helpers, and add an Authentication paragraph describing the shared U2M
browser-login helpers (generate_pkce / build_authorize_url /
exchange_code_for_tokens) consumed by both the web backend and the CLI.
Add an Alignment with ucode section to the provider README: the connector's
PROFILE/UNIFIED/U2M strategies let an OpenHands agent reach AI Gateway the
same key-free, governed, workspace-credential way Databricks ucode does.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants