Release v1.10.0: SDK auth migration, summarization agent, and Lakebase fixes by forrestmurray-db · Pull Request #125 · databricks-solutions/vibescaler

forrestmurray-db · 2026-04-13T18:40:48Z

Summary

SDK Auth Migration: Replace manual token storage with Databricks SDK-based authentication (resolve_databricks_token()). Removes DatabricksTokenDB model, token input fields, and DATABRICKS_TOKEN env var mutations. All services now use SDK auth.
Summarization Agent Overhaul: Refactor summarization to a tool-based agent with span data resolution. Add facilitator visibility into summarization status/results, job tracking via SummarizationJob table, and resummarize capability.
Lakebase Fixes: Switch to do_connect token injection, fix connection pool settings, and update specs with pool requirements and service principal permissions.
Docs: Update facilitator guide for Lakebase and Git-based deployment, fix setup prerequisites.
Bug Fixes: Deduplicate convertTraceToTraceData for summary propagation, handle databricks_host with existing https:// prefix, resolve available-models without mlflow intake config.

Changes (63 files, +4839 / -1206)

Auth (12 commits)

Add resolve_databricks_token() utility using Databricks SDK
Remove DatabricksTokenDB model and databricks_tokens table
Remove token input fields from IntakePage and DBSQLExportPage
Replace token_storage patterns across all services and routers
Update TypeScript models and service docstrings

Summarization (7 commits)

Refactor to tool-based agent with span data resolution
Add facilitator visibility into summarization status and results
New SummarizationJob model and migration (0018)
Fix summary propagation through convertTraceToTraceData
Use SDK auth and separate DB session for background tasks

Lakebase & Database (3 commits)

Switch to do_connect token injection for Lakebase
Fix pool settings for Databricks SQL connections
Update specs with connection pool requirements

Docs (4 commits)

Update facilitator guide for Lakebase and Git-based deployment
Add service principal permissions to AUTHENTICATION_SPEC
Fix Lakebase setup prerequisites

Test plan

Verify SDK auth works end-to-end (token resolution, service initialization)
Test summarization agent with tool-based flow
Confirm facilitator dashboard shows summarization status
Verify Lakebase connection pool behavior
Run just test-server — all backend tests pass
Run just ui-test-unit — all frontend tests pass
Run just e2e — end-to-end tests pass

🤖 Generated with Claude Code

Replace the hardcoded MODEL_MAPPING with a live API call to Databricks serving-endpoints. The backend uses async httpx to avoid blocking the event loop, and the frontend fetches models via useAvailableModels and builds options dynamically with buildModelOptions. All components now store and pass endpoint names directly instead of translating between display names and backend names. Also switches model prefetching from an eager useEffect in WorkflowContext to intent-based prefetchQuery on hover/focus of navigation buttons, and clears Databricks auth env vars that can override token auth in the MLflow intake service. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Replace stale hasMlflowConfig references in DiscoveryAnalysisTab with modelOptions.length checks to match the switch to dynamic model listing. Fix discovery-complete endpoint returning 404 for facilitators whose workshop_id is NULL by also checking against workshop.facilitator_id. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Prevent worktree contents from being tracked. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…vice init Add a public resolve_databricks_token() function that uses the Databricks SDK for auth (service principal on Apps, CLI profile locally) with a fallback to DATABRICKS_TOKEN env var. Remove the token_storage/db_service fallback chain from DatabricksService.__init__. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

MLflow uses whatever Databricks auth the SDK provides. Stop setting DATABRICKS_TOKEN in the environment — only set DATABRICKS_HOST so the SDK knows which workspace to target. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Mark databricks_token as deprecated with empty default in Python models (MLflowIntakeConfig, MLflowIntakeConfigCreate, DBSQLExportRequest, DatabricksConfig) and optional in TypeScript models. SDK auth is used instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…outer Replace 10+ token_storage.get_token / db_service.get_databricks_token fallback chains with resolve_databricks_token(). Remove all os.environ["DATABRICKS_TOKEN"] mutations. Update test mocks to patch resolve_databricks_token instead of token_storage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…outers Update discovery_service (7 refs), judge_service, draft_rubric_grouping, database_service, databricks router, dbsql_export router. Remove set/get_databricks_token methods from database_service. Update test mocks to patch resolve_databricks_token. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove the token persistence infrastructure: - DatabricksTokenDB SQLAlchemy model from database.py - databricks_tokens from postgres_manager ALLOWED_TABLES and CREATE TABLE - DatabricksTokenDB import from database_service.py - test_token_storage_service.py (5 tests for deleted functionality) - Update postgres_manager test expectations token_storage_service.py is kept for Custom LLM API key storage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…Page Users no longer need to provide Databricks tokens — the backend uses SDK auth (service principal on Apps, CLI profile locally). Remove all token state, localStorage persistence, form fields, and validation from both pages. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove os.environ["DATABRICKS_TOKEN"] and DATABRICKS_CLIENT_ID/SECRET pop() calls from alignment_service, judge_service, dbsql_export_service, and database_service. The SDK handles auth automatically — only DATABRICKS_HOST needs to be set for MLflow to know which workspace. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

AUTHENTICATION_SPEC: - Rewrite Architecture Context to describe the two-layer model accurately - Add new "Databricks API Authentication" section with token resolution contract, environment-specific behavior, MLflow auth, and what was removed - Add "Future: Per-User Auth" subsection for OBO pattern - Add 8 success criteria for Databricks API auth - Mark SDK Auth Migration as complete in implementation log BUILD_AND_DEPLOY_SPEC: - Mark DATABRICKS_TOKEN as optional (SDK auth preferred) in env vars table - Update Databricks Apps Authentication section to reference resolve_databricks_token() and link to AUTHENTICATION_SPEC JUDGE_EVALUATION_SPEC: - Fix troubleshooting note: "host, token" → "host, experiment ID + SDK auth" - Add SDK Auth Migration to implementation log README.md: - Add keyword index entries: PAT, SDK auth, resolve_databricks_token, service principal, DATABRICKS_TOKEN, DATABRICKS_CLIENT_ID, OAuth, CLI profile Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Document the Databricks resources the app's service principal needs access to: MLflow Experiment (Can edit), Model Serving Endpoints (Can query), SQL Warehouse (Can use), Unity Catalog Volume (Can read and write). Note which are required vs optional. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Lakebase (PostgreSQL) is the primary production database. Its OAuth tokens are refreshed via WorkspaceClient().config.oauth_token() every 15 minutes. Split permissions into core (Lakebase, MLflow, Serving Endpoints) vs optional (SQL Warehouse, UC Volume). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

AUTHENTICATION_SPEC: - Add "Lakebase Connection Pool" section with token lifecycle, do_connect injection pattern, required pool settings, credential API, and setup prerequisites — all with links to Databricks docs - Update Lakebase row in permissions table to reference generate_database_credential - Add 7 Lakebase connection pool success criteria - Add implementation log entry BUILD_AND_DEPLOY_SPEC: - Add Lakebase env vars (PGHOST, PGDATABASE, PGUSER, PGPORT, PGSSLMODE, PGAPPNAME, ENDPOINT_NAME, DATABASE_ENV) to environment variables table - Add implementation log section with SDK auth and Lakebase pool entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ings Replace the creator-based connection factory with the recommended do_connect event pattern from Databricks docs. Key changes: - OAuthTokenManager → LakebaseCredentialManager using generate_database_credential(endpoint=ENDPOINT_NAME) API - Token injection via do_connect event (not creator callable) - pool_recycle: 300s → 3600s (was causing excessive connection churn) - pool_pre_ping: True → False (conflicts with do_connect injection) - max_overflow: 10 → 5 (caps at 20 total across 2 workers) - postgres_manager: pool created once with custom OAuthConnection class, never recreated on token refresh - database.py: _reset_connection_pool no longer calls force_refresh Reference: https://docs.databricks.com/aws/en/lakebase/connect/custom-app.html Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Remove databricks_token from CSV upload body type, make DatabricksConfig.token optional, update ApiService/WorkshopsService docstrings to reflect SDK auth. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When Lakebase is added as a Databricks App resource, the platform automatically creates a Postgres role for the service principal. Manual databricks_create_role() is only needed for external/additional identities outside the App resource integration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ndency - Add summarization_enabled, summarization_model, summarization_guidance columns to WorkshopDB - Add summary (JSON) column to TraceDB for structured milestone views - Add corresponding Pydantic model fields and DB service methods - Add pydantic-ai-slim[openai] dependency - Create TRACE_SUMMARIZATION_SPEC with success criteria - Create implementation plan Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

… with batch support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…raceViewer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ingestion - PUT /workshops/{id}/summarization-settings for facilitator config - POST /workshops/{id}/resummarize for on-demand re-summarization - Background summarization triggered after MLflow trace ingestion Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…odelOptions The settings agent used a function name that doesn't exist in modelMapping.ts. Fixed to follow the same pattern as other components: useAvailableModels() + buildModelOptions(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…s fork The FastAPI lifespan bootstrap ran migrations in each worker process, requiring interprocess locks and never applying new migrations after initial deploy. Move migration execution to gunicorn's on_starting hook which runs exactly once in the master process before any workers fork. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

# Conflicts: # specs/BUILD_AND_DEPLOY_SPEC.md

…nd tasks - Use resolve_databricks_token() instead of stored PAT (SDK auth compat) - Create new SessionLocal() inside background tasks to avoid using the request-scoped DB session after it's closed - Add logging for summarization completion Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add the server-synced project setup form, API read/update routes, setup progress handling, and V2-focused E2E harness coverage for the project setup slice. Refs #129, #134

Capture the shared V2 domain language, master north star, rubric research, design handoff sources, and to-prd skill so collaborators can review the control-surface direction from the repo.

Move the exported JBW V2 project design files into a stable top-level path so collaborators can use the surface prototypes as a guidepost for the V2 control surface work.

Capture the to-issues workflow and rubric calibration research so V2 planning agents can turn approved PRDs into vertical slices with the relevant grading context.

Resolve spec and coverage baseline conflicts by keeping the project setup UI wiring branch's server-synced setup/settings behavior and latest spec coverage baseline.

Align the branch with the provider-resolved auth implementation so local databases can resolve the auth-removal migration and runtime code no longer depends on legacy password auth.

Show the setup handoff before opening Workspace so developers can review first Sprint defaults, starter Rubric, starter Review Feed, and the rubric review readiness gate.

…mpatibility (#148) Migrations 0002, 0004, and 0010 used _is_postgres() to branch between sa.text("0")/sa.text("FALSE") for boolean server_default values. This check returned False on Lakebase (likely due to render_as_batch=True context), causing PostgreSQL to reject DEFAULT 0 for BOOLEAN columns with DatatypeMismatch. Replace with sa.false()/sa.true() — SQLAlchemy's dialect-agnostic boolean literals that render correctly on both SQLite and PostgreSQL. This matches the pattern already used by migration 0021. Strengthened the guard test to reject the _is_postgres() branching pattern and require sa.true()/sa.false() going forward. Co-authored-by: Isaac Co-authored-by: Max Fisher <max.fisher+data@databricks.com>

- Disable strict tool definitions on the trace summarization model profile so the Databricks OpenAI-compat shim accepts requests across Claude 4.6/4.7, gpt-5, gpt-5-codex, and Gemini Flash 3.5 (the shim rejects "tools.N.custom.strict" regardless of backing model). - Set litellm.drop_params=True at DSPy LM construction so hardcoded sampling params (temperature=0.2/0.3 in discovery + followup) don't 400 on gpt-5 reasoning models that require temperature=1. - Add a workspace-keyed TTL cache to DatabricksService.list_serving_endpoints with concurrent-request deduplication via per-key asyncio.Lock; the frontend's per-workshop React Query cache was triggering frequent upstream refetches of a workspace-global list. TTL configurable via DATABRICKS_ENDPOINTS_CACHE_TTL_S (default 300s). Co-authored-by: Isaac

* Add Docusaurus docs at /docs with spec coverage and local search. Serve built docs from FastAPI even when Lakebase is not configured, gate the app on setup status, and embed spec coverage in all specifications. * Fix docs/app navigation and prevent localhost redirect on Databricks Apps. Rewrite internal redirect Location headers, add cross-links between the workshop UI and Docusaurus, and keep setup redirects on the public app hostname.

Resolves the Lakebase pool-exhaustion cascade in gh#163. Three things were compounding under concurrent workshop load: 1. ENDPOINT_NAME was never bound in app.yaml, so credential generation silently fell back to a workspace OAuth token instead of using generate_database_credential(endpoint=...). Adds `valueFrom: postgres` binding and removes the silent fallback — missing ENDPOINT_NAME now fails loudly at engine creation. 2. _is_connection_error() matched the substring "connection timed out", which is also present in SQLAlchemy QueuePool TimeoutError. Pool saturation was being misclassified as a transient connection error, triggering engine.dispose() during the retry path and dropping in-flight connections held by other concurrent requests — amplifying the outage. TimeoutError is now explicitly rejected; DatabaseErrorMiddleware still returns a clean 503 for the client. 3. pool_pre_ping=True added a SELECT 1 per checkout (increasing checkout latency under contention) and pool_recycle=2700 forced unnecessary connection churn. Both realigned to AUTHENTICATION_SPEC.md:118-119 (pool_pre_ping=False, pool_recycle=3600). Co-authored-by: Isaac

Two client-side fixes for Gemini-backed Databricks serving endpoints, plus an integration test that pins the cross-provider API matrix. - Replace null `id` on chat completion responses with a placeholder so OpenAI SDK 2.x's Pydantic validator doesn't reject. Installed as an httpx response hook on the shared OpenAI client; mutates only chat completion shapes (other JSON like endpoint listings passes through). - Normalize `message.content` when it arrives as Gemini's array of part dicts (`[{type:"text", text:..., thoughtSignature:...}]`) into a plain string so `discovery_analysis_service`, `rubric_generation_service`, and similar callers don't have to special-case Gemini. - Add an integration test that probes every model in the workshop's picker against both Chat Completions and Responses API. Pins the design constraint that Databricks' Responses API passthrough is OpenAI-only (Claude/Gemini/Llama reject by design, so they must stay on Chat Completions). Skipped automatically when Databricks creds aren't configured; runnable on demand via `just test-integration`. Co-authored-by: Isaac

Trace summarization on Gemini 3.5 Flash via Databricks' OpenAI-compat shim breaks on the second turn — the OpenAI Chat Completions wire format has no slot for Gemini's ``thought_signature``, and Gemini 3+ requires it round- tripped per turn. Route Gemini through the native passthrough at ``/ai-gateway/gemini`` using pydantic-ai's ``GoogleModel``, which handles ``thought_signature`` natively. Other foundation models (Claude, gpt-5, Llama) keep going through the OpenAI shim with ``OpenAIChatModel``. - ``TraceSummarizationService`` detects Gemini-family endpoint names at construction and builds ``GoogleModel`` over a ``google.genai.Client`` pointed at the workspace's ai-gateway/gemini URL with the Databricks token in the Authorization header. - Force httpx transport via ``HttpOptions.httpx_async_client``. google- genai prefers aiohttp when both are installed, which silently bypasses our request hook (see ``_use_aiohttp`` in google.genai._api_client). - Install an httpx request hook that strips ``id`` from outgoing ``functionCall``/``functionResponse`` parts before they reach the ai-gateway. Vertex AI's ``FunctionCall`` proto has no ``id`` field, but the google-genai SDK adds one when echoing the model's previous tool call back; the ai-gateway is a pure passthrough that doesn't strip it, so multi-turn requests 400 without this. The hook rewrites ``request.stream`` (where httpx actually reads body from), not just ``_content``, otherwise the wire body still carries the original ``id`` while Content-Length reflects the new size. - Add ``pydantic-ai-slim[google]`` extra to bring in ``google-genai``. Tests: - Unit: model-routing dispatch (Gemini → GoogleModel, others → OpenAIChatModel), Gemini client's base_url points at ai-gateway, function_call/function_response id strip, no-op on simple text turns. - Integration: live multi-turn summarization against ``databricks-gemini-3-5-flash``. Exercises the full chain end-to-end and acts as a regression guard if Databricks ships changes to the ai-gateway proto. Co-authored-by: Isaac

…dation Follow-up to 53d9e70. The previous fix bound ENDPOINT_NAME via `valueFrom: postgres`, but the Apps platform exposes the Lakebase endpoint identifier under the default resource alias `database`. `valueFrom: postgres` resolved to an empty string at runtime, and db_bootstrap.py — which runs in the gunicorn on_starting hook before create_engine_for_backend() — bypassed the engine-creation guard and handed the empty value straight to the SDK, crashing app startup. - app.yaml: `valueFrom: postgres` → `valueFrom: database`. - LakebaseCredentialManager.get_password() now validates the endpoint argument itself, so all three call sites (do_connect handler, db_bootstrap, postgres_manager) surface the same actionable error on misconfiguration rather than the opaque SDK protobuf failure. Co-authored-by: Isaac

The /discovery-comments/stream and /discovery-agent-runs/{id}/stream routes bound `db: Session = Depends(get_db)`, holding one pool connection per subscribed EventSource for the entire stream lifetime. Each DiscoveryTraceCard opens an EventSource for the comments stream (always) plus the agent-run stream (while a run is active), so a single user with ~10 visible trace cards already approaches the pool ceiling (5+5 per worker × 2 workers = 20). Combined with any background-worker connections, this is what was driving the production cascade on gh#163 after pool-timeout retries were neutered in 53d9e70. Refactor both routes to acquire SessionLocal() per poll iteration and release before the sleep, so connection holding time drops from the stream lifetime to single-digit milliseconds per query. Co-authored-by: Isaac

Production reported repeated 502s from discovery analysis on Gemini: ``server received an invalid response from an upstream server``. The OpenAI-compat shim at ``/serving-endpoints/chat/completions`` translates Vertex AI responses into OpenAI shape, but for some Gemini outputs (safety blocks, certain content-part configurations) that translation fails and the shim returns a 502 instead of a usable response. Route Gemini chat completions through Databricks' native ai-gateway/gemini passthrough using ``google.genai.Client.models.generate_content``. The adapter returns the chat-completions dict shape callers already expect, so ``discovery_analysis_service`` and ``rubric_generation_service`` work unchanged. Other foundation models (Claude, gpt-5, Llama) stay on the OpenAI-compat shim — they don't have the response-shape issues that trip the translator for Gemini. - ``DatabricksService.call_chat_completion`` detects Gemini endpoint names and dispatches to ``_call_gemini_chat_via_ai_gateway``. - ``_get_gemini_client`` lazily builds and caches one ``google.genai.Client`` per workspace, pointed at ``{workspace}/ai-gateway/gemini`` with the Databricks token in the Authorization header. - Helpers ``_messages_to_genai_contents`` and ``_genai_response_to_chat_shape`` translate between OpenAI chat messages and Gemini ``Content`` objects / ``GenerateContentResponse``. System messages collapse into ``system_instruction``; response text parts concatenate into the chat-completions string content. - The existing ``_normalize_shim_content`` safety net stays in place for any non-Gemini model that ever returns array-shaped content. Tests: - Unit: Gemini endpoint names dispatch to the ai-gateway helper (and must NOT touch the OpenAI client); non-Gemini endpoints stay on the OpenAI client; helpers correctly translate messages and responses. - Integration: live Gemini chat completion via ai-gateway returns a plain string content (the discovery_analysis_service contract). Co-authored-by: Isaac

The Gemini ai-gateway routing for trace summarization and discovery analysis depends on the ``google.genai`` package, brought in via ``pydantic-ai-slim[google]`` in pyproject.toml. uv.lock was already updated, but requirements.txt — which the Databricks app build uses (``uv pip install -r requirements.txt``) — wasn't. The deployed app failed at runtime with ``No module named 'google.genai'``. Regenerated via: uv export --format requirements-txt --no-emit-project -o requirements.txt Co-authored-by: Isaac

Production hit 400 on discovery analysis with gpt-5.5: "Unsupported value: 'temperature' does not support 0.3 with this model. Only the default (1) value is supported." OpenAI reasoning models (gpt-5 / gpt-5.1 / gpt-5.5 / gpt-5-codex and the o1/o3/o4 series) reject any temperature != 1. LiteLLM has ``drop_params`` to handle this transparently on the DSPy path (already enabled in ``discovery_dspy._configure_litellm_drop_params``), but the OpenAI Python SDK that ``DatabricksService.call_chat_completion`` uses has no equivalent — we have to normalize the request ourselves. - Add ``_is_openai_reasoning_model`` detector matching ``gpt-5``, ``o1``, ``o3``, ``o4`` endpoint names (with or without the ``databricks-`` prefix). - Add ``_normalize_request_for_reasoning_model`` which forces ``temperature=1.0`` for detected reasoning models and logs the override for auditability. - Apply normalization in both ``call_chat_completion`` and ``call_serving_endpoint`` so all caller paths benefit. Verified live against dogfood-staging: ``databricks-gpt-5`` and ``databricks-gpt-5-mini`` now return content for a discovery-analysis- shaped request that previously 400'd. Tests: - Parametrized detector tests covering gpt-5, gpt-5-codex, gpt-5.1, gpt-5.5, o1-preview, o3-mini, o4-mini. - Negative tests confirming Claude / Llama / Gemini / gpt-4o are NOT treated as reasoning models. - Unit test for the normalization helper. - End-to-end test that call_chat_completion forwards temperature=1.0 to the OpenAI client even when the caller passed 0.3. Co-authored-by: Isaac

…162) * fix(llm): enable cross-provider interop and cache serving endpoints - Disable strict tool definitions on the trace summarization model profile so the Databricks OpenAI-compat shim accepts requests across Claude 4.6/4.7, gpt-5, gpt-5-codex, and Gemini Flash 3.5 (the shim rejects "tools.N.custom.strict" regardless of backing model). - Set litellm.drop_params=True at DSPy LM construction so hardcoded sampling params (temperature=0.2/0.3 in discovery + followup) don't 400 on gpt-5 reasoning models that require temperature=1. - Add a workspace-keyed TTL cache to DatabricksService.list_serving_endpoints with concurrent-request deduplication via per-key asyncio.Lock; the frontend's per-workshop React Query cache was triggering frequent upstream refetches of a workspace-global list. TTL configurable via DATABRICKS_ENDPOINTS_CACHE_TTL_S (default 300s). Co-authored-by: Isaac * fix(db): wire ENDPOINT_NAME binding and stop pool-timeout cascade Resolves the Lakebase pool-exhaustion cascade in gh#163. Three things were compounding under concurrent workshop load: 1. ENDPOINT_NAME was never bound in app.yaml, so credential generation silently fell back to a workspace OAuth token instead of using generate_database_credential(endpoint=...). Adds `valueFrom: postgres` binding and removes the silent fallback — missing ENDPOINT_NAME now fails loudly at engine creation. 2. _is_connection_error() matched the substring "connection timed out", which is also present in SQLAlchemy QueuePool TimeoutError. Pool saturation was being misclassified as a transient connection error, triggering engine.dispose() during the retry path and dropping in-flight connections held by other concurrent requests — amplifying the outage. TimeoutError is now explicitly rejected; DatabaseErrorMiddleware still returns a clean 503 for the client. 3. pool_pre_ping=True added a SELECT 1 per checkout (increasing checkout latency under contention) and pool_recycle=2700 forced unnecessary connection churn. Both realigned to AUTHENTICATION_SPEC.md:118-119 (pool_pre_ping=False, pool_recycle=3600). Co-authored-by: Isaac * fix(llm): patch Gemini Chat Completions shim quirks; pin API matrix Two client-side fixes for Gemini-backed Databricks serving endpoints, plus an integration test that pins the cross-provider API matrix. - Replace null `id` on chat completion responses with a placeholder so OpenAI SDK 2.x's Pydantic validator doesn't reject. Installed as an httpx response hook on the shared OpenAI client; mutates only chat completion shapes (other JSON like endpoint listings passes through). - Normalize `message.content` when it arrives as Gemini's array of part dicts (`[{type:"text", text:..., thoughtSignature:...}]`) into a plain string so `discovery_analysis_service`, `rubric_generation_service`, and similar callers don't have to special-case Gemini. - Add an integration test that probes every model in the workshop's picker against both Chat Completions and Responses API. Pins the design constraint that Databricks' Responses API passthrough is OpenAI-only (Claude/Gemini/Llama reject by design, so they must stay on Chat Completions). Skipped automatically when Databricks creds aren't configured; runnable on demand via `just test-integration`. Co-authored-by: Isaac * feat(summarization): route Gemini through ai-gateway for multi-turn Trace summarization on Gemini 3.5 Flash via Databricks' OpenAI-compat shim breaks on the second turn — the OpenAI Chat Completions wire format has no slot for Gemini's ``thought_signature``, and Gemini 3+ requires it round- tripped per turn. Route Gemini through the native passthrough at ``/ai-gateway/gemini`` using pydantic-ai's ``GoogleModel``, which handles ``thought_signature`` natively. Other foundation models (Claude, gpt-5, Llama) keep going through the OpenAI shim with ``OpenAIChatModel``. - ``TraceSummarizationService`` detects Gemini-family endpoint names at construction and builds ``GoogleModel`` over a ``google.genai.Client`` pointed at the workspace's ai-gateway/gemini URL with the Databricks token in the Authorization header. - Force httpx transport via ``HttpOptions.httpx_async_client``. google- genai prefers aiohttp when both are installed, which silently bypasses our request hook (see ``_use_aiohttp`` in google.genai._api_client). - Install an httpx request hook that strips ``id`` from outgoing ``functionCall``/``functionResponse`` parts before they reach the ai-gateway. Vertex AI's ``FunctionCall`` proto has no ``id`` field, but the google-genai SDK adds one when echoing the model's previous tool call back; the ai-gateway is a pure passthrough that doesn't strip it, so multi-turn requests 400 without this. The hook rewrites ``request.stream`` (where httpx actually reads body from), not just ``_content``, otherwise the wire body still carries the original ``id`` while Content-Length reflects the new size. - Add ``pydantic-ai-slim[google]`` extra to bring in ``google-genai``. Tests: - Unit: model-routing dispatch (Gemini → GoogleModel, others → OpenAIChatModel), Gemini client's base_url points at ai-gateway, function_call/function_response id strip, no-op on simple text turns. - Integration: live multi-turn summarization against ``databricks-gemini-3-5-flash``. Exercises the full chain end-to-end and acts as a regression guard if Databricks ships changes to the ai-gateway proto. Co-authored-by: Isaac * fix(db): correct Lakebase resource alias and centralize endpoint validation Follow-up to 53d9e70. The previous fix bound ENDPOINT_NAME via `valueFrom: postgres`, but the Apps platform exposes the Lakebase endpoint identifier under the default resource alias `database`. `valueFrom: postgres` resolved to an empty string at runtime, and db_bootstrap.py — which runs in the gunicorn on_starting hook before create_engine_for_backend() — bypassed the engine-creation guard and handed the empty value straight to the SDK, crashing app startup. - app.yaml: `valueFrom: postgres` → `valueFrom: database`. - LakebaseCredentialManager.get_password() now validates the endpoint argument itself, so all three call sites (do_connect handler, db_bootstrap, postgres_manager) surface the same actionable error on misconfiguration rather than the opaque SDK protobuf failure. Co-authored-by: Isaac * fix(discovery): release DB sessions between SSE polls The /discovery-comments/stream and /discovery-agent-runs/{id}/stream routes bound `db: Session = Depends(get_db)`, holding one pool connection per subscribed EventSource for the entire stream lifetime. Each DiscoveryTraceCard opens an EventSource for the comments stream (always) plus the agent-run stream (while a run is active), so a single user with ~10 visible trace cards already approaches the pool ceiling (5+5 per worker × 2 workers = 20). Combined with any background-worker connections, this is what was driving the production cascade on gh#163 after pool-timeout retries were neutered in 53d9e70. Refactor both routes to acquire SessionLocal() per poll iteration and release before the sleep, so connection holding time drops from the stream lifetime to single-digit milliseconds per query. Co-authored-by: Isaac * fix(discovery): route Gemini chat completions through ai-gateway Production reported repeated 502s from discovery analysis on Gemini: ``server received an invalid response from an upstream server``. The OpenAI-compat shim at ``/serving-endpoints/chat/completions`` translates Vertex AI responses into OpenAI shape, but for some Gemini outputs (safety blocks, certain content-part configurations) that translation fails and the shim returns a 502 instead of a usable response. Route Gemini chat completions through Databricks' native ai-gateway/gemini passthrough using ``google.genai.Client.models.generate_content``. The adapter returns the chat-completions dict shape callers already expect, so ``discovery_analysis_service`` and ``rubric_generation_service`` work unchanged. Other foundation models (Claude, gpt-5, Llama) stay on the OpenAI-compat shim — they don't have the response-shape issues that trip the translator for Gemini. - ``DatabricksService.call_chat_completion`` detects Gemini endpoint names and dispatches to ``_call_gemini_chat_via_ai_gateway``. - ``_get_gemini_client`` lazily builds and caches one ``google.genai.Client`` per workspace, pointed at ``{workspace}/ai-gateway/gemini`` with the Databricks token in the Authorization header. - Helpers ``_messages_to_genai_contents`` and ``_genai_response_to_chat_shape`` translate between OpenAI chat messages and Gemini ``Content`` objects / ``GenerateContentResponse``. System messages collapse into ``system_instruction``; response text parts concatenate into the chat-completions string content. - The existing ``_normalize_shim_content`` safety net stays in place for any non-Gemini model that ever returns array-shaped content. Tests: - Unit: Gemini endpoint names dispatch to the ai-gateway helper (and must NOT touch the OpenAI client); non-Gemini endpoints stay on the OpenAI client; helpers correctly translate messages and responses. - Integration: live Gemini chat completion via ai-gateway returns a plain string content (the discovery_analysis_service contract). Co-authored-by: Isaac * chore(deps): regenerate requirements.txt to include google-genai The Gemini ai-gateway routing for trace summarization and discovery analysis depends on the ``google.genai`` package, brought in via ``pydantic-ai-slim[google]`` in pyproject.toml. uv.lock was already updated, but requirements.txt — which the Databricks app build uses (``uv pip install -r requirements.txt``) — wasn't. The deployed app failed at runtime with ``No module named 'google.genai'``. Regenerated via: uv export --format requirements-txt --no-emit-project -o requirements.txt Co-authored-by: Isaac * fix(discovery): force temperature=1 for gpt-5 / o-series endpoints Production hit 400 on discovery analysis with gpt-5.5: "Unsupported value: 'temperature' does not support 0.3 with this model. Only the default (1) value is supported." OpenAI reasoning models (gpt-5 / gpt-5.1 / gpt-5.5 / gpt-5-codex and the o1/o3/o4 series) reject any temperature != 1. LiteLLM has ``drop_params`` to handle this transparently on the DSPy path (already enabled in ``discovery_dspy._configure_litellm_drop_params``), but the OpenAI Python SDK that ``DatabricksService.call_chat_completion`` uses has no equivalent — we have to normalize the request ourselves. - Add ``_is_openai_reasoning_model`` detector matching ``gpt-5``, ``o1``, ``o3``, ``o4`` endpoint names (with or without the ``databricks-`` prefix). - Add ``_normalize_request_for_reasoning_model`` which forces ``temperature=1.0`` for detected reasoning models and logs the override for auditability. - Apply normalization in both ``call_chat_completion`` and ``call_serving_endpoint`` so all caller paths benefit. Verified live against dogfood-staging: ``databricks-gpt-5`` and ``databricks-gpt-5-mini`` now return content for a discovery-analysis- shaped request that previously 400'd. Tests: - Parametrized detector tests covering gpt-5, gpt-5-codex, gpt-5.1, gpt-5.5, o1-preview, o3-mini, o4-mini. - Negative tests confirming Claude / Llama / Gemini / gpt-4o are NOT treated as reasoning models. - Unit test for the normalization helper. - End-to-end test that call_chat_completion forwards temperature=1.0 to the OpenAI client even when the caller passed 0.3. Co-authored-by: Isaac

* docs: revamp README for VibeScaler public release Rewrite for an external/OSS audience ahead of the public v1.10 release: value-first lead (what it is, who it's for, why), a How it works section (Discovery, Annotation/IRR, Alignment via MLflow align(), Evaluate at scale), a filled-in Quick Start, expanded docs index, and Built on MLflow / Contributing / Security sections. Rename product to VibeScaler and fix the LICENSE link. Co-authored-by: Isaac * docs: address README review (omit Discovery link, add last-updated note) Drop the Discovery doc link per review, add a last-updated and what-changed note at the top, and keep the SME wording from review. Co-authored-by: Isaac * docs: address review feedback on README Per Forrest's review: remove the last-updated line and the dated release-zip step, reword the tagline to lead with collaboration, generalize the alignment step to optimization techniques and tracked metrics (no specific APIs), add the Databricks Marketplace as a deploy option, and use 'project' instead of 'workshop'. Co-authored-by: Isaac --------- Co-authored-by: yulin-yang_data <yulin.yang@databricks.com>

…t-cache' into v1.10.0

Bug bash fixes: - #151: Copy Output copies the displayed representation (formatted vs raw) in TraceViewer - #152/#154: multi-line criterion text round-trips intact (section-aware build/parse in rubricUtils; whitespace-pre-wrap displays) - #153: free-form criterion type removed from rubric creation UI (legacy criteria parse as likert) - #155: annotation completion shows terminal complete screen instead of re-triggering modal - #156: facilitator annotation stats poll every 15s (no manual refresh) - #157: hard-coded Results recommendations removed end-to-end (UI card, krippendorff/irr canned strings) - #158: high-disagreement finding scoped to the requested metric (no legacy-rating leakage past sigma threshold) - #150: fallback follow-up questions visibly badged for participants ("Standard question") - #161: episodic-memory dedup on judge re-alignment (filter already-aligned trace IDs; repair corrupted judges) - #163 hardening: AG-UI endpoints no longer hold a pool connection across LLM streams Restored custom LLM provider endpoints deleted as collateral in 066e62c (spec, storage, models, and client consumers were all still active). Test stabilization (all pre-existing failures on HEAD): - auth tests aligned to provider-resolved session model (/api/users paths); removed-login tests deleted - stale fakes/assertions updated (eval mode judge_model, summarization events, postgres tables) - vitest: @CopilotKit CSS deps inlined; WorkflowProvider/EventSource mocks for discovery tests - asyncio.run replaces deprecated get_event_loop in rubric lifecycle tests - Node 26 localStorage shim for ProjectSetupPage/UserContext tests Verification: just test-server 923 passed/0 failed; just ui-test-unit 350/0; just ui-typecheck and ui-lint clean. Co-authored-by: Isaac

The V2 project-setup work (abe1cfa) replaced the E2E test lib with a minimal project-setup-only variant and left the SME/participant workshop flow unrouted, breaking `just e2e` at startup and making annotation unreachable for non-facilitator users. - Reconcile client/tests/lib: full TestScenario builder restored (types, scenario-builder, api-mocker, indexes) with V2 grafts — withProjectSetup, projectSetup on BuiltScenario, deployment-status/auth-session/project mock routes, and buildFacilitator. One builder now serves both the legacy specs and the V2 project-setup spec. - Provider-auth login seam: loginAs sets the mocked session user (mocked scenarios) or intercepts only /api/auth/session (real-API scenarios) so multi-SME flows work without the removed password login; real-API builds resolve the session and create a completed project setup so the V2 gates pass. - Route SME/participant users on /workshop/:id to the phase-driven workshop experience (WorkshopDemoLanding) instead of a placeholder card. - /deployment/status only requires Lakebase setup for postgres targets; sqlite deployments (local dev, E2E) are fully operable (tagged regression tests in test_build_deploy.py). Verification: just e2e 2 passed/0 failed; just test-server + ui-test-unit 1277 passed/0 failed; ui-lint and ui-typecheck clean. Co-authored-by: Isaac

- RUBRIC_SPEC: judgeType is likert|binary; legacy freeform criteria are no longer creatable and parse as likert (matches rubricUtils normalization shipped for gh#153) - JUDGE_EVALUATION_SPEC: drop freeform from the MemAlign judge-type list - TRACE_DISPLAY_SPEC: new success criterion "Copy Output copies the representation currently displayed (formatted vs raw)" covering gh#151, linked to TraceViewer.copyOutput.test.tsx - regenerate SPEC_COVERAGE_MAP Co-authored-by: Isaac

Co-authored-by: Isaac

forrestmurray-db and others added 30 commits April 10, 2026 10:51

chore: add .claude/worktrees/ to gitignore

bbd882c

Prevent worktree contents from being tracked. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(summarization): add PydanticAI-based trace summarization service…

ecf37de

… with batch support Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(summarization): add MilestoneView component with tab toggle in T…

89f26dc

…raceViewer Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(summarization): add facilitator settings UI for trace summarization

98f37fd

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

feat(summarization): regenerate API client with summarization endpoints

c3d4e6f

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

chore(deploy): exclude .claude and htmlcov from databricks sync

50883f2

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Merge fix/async-models-endpoint-and-prefetch into release/v1.10.0

c148648

Merge feature/sdk-auth-migration into release/v1.10.0

8c46900

# Conflicts: # specs/BUILD_AND_DEPLOY_SPEC.md

forrestmurray-db and others added 30 commits May 6, 2026 11:31

feat: implement V2 project setup flow

abe1cfa

Add the server-synced project setup form, API read/update routes, setup progress handling, and V2-focused E2E harness coverage for the project setup slice. Refs #129, #134

spec: document V2 control surface model

a7c8c4a

Capture the shared V2 domain language, master north star, rubric research, design handoff sources, and to-prd skill so collaborators can review the control-surface direction from the repo.

spec: expose V2 design guidepost

a37f994

Move the exported JBW V2 project design files into a stable top-level path so collaborators can use the surface prototypes as a guidepost for the V2 control surface work.

docs: add issue slicing skill and rubric research

10ddcdb

Capture the to-issues workflow and rubric calibration research so V2 planning agents can turn approved PRDs into vertical slices with the relevant grading context.

Merge feat/social-mode into project setup UI wiring

49e492c

Resolve spec and coverage baseline conflicts by keeping the project setup UI wiring branch's server-synced setup/settings behavior and latest spec coverage baseline.

fix(app): stabilize Databricks app startup diagnostics

c62a38a

fix(release): integrate DNB alignment hotfixes (#147)

7f1691f

fix(migration): use postgres-safe boolean defaults

97b397f

Restore provider-resolved auth flow

9a50539

Align the branch with the provider-resolved auth implementation so local databases can resolve the auth-removal migration and runtime code no longer depends on legacy password auth.

Add V2 setup handoff workspace defaults

41b18ca

Show the setup handoff before opening Workspace so developers can review first Sprint defaults, starter Rubric, starter Review Feed, and the rubric review readiness gate.

fix(auth): configure Databricks MLflow once per worker (#165)

7a5f1ac

Merge remote-tracking branch 'origin/release/v1.10.0' into v1.10.0

4aea21b

Merge remote-tracking branch 'origin/hotfix/model-interop-and-endpoin…

ff772bd

…t-cache' into v1.10.0

Merge origin/release/v1.10.0 (README revamp #166) into v1.10.0

85dedd6

Co-authored-by: Isaac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v1.10.0: SDK auth migration, summarization agent, and Lakebase fixes#125

Release v1.10.0: SDK auth migration, summarization agent, and Lakebase fixes#125
forrestmurray-db wants to merge 141 commits into
mainfrom
release/v1.10.0

forrestmurray-db commented Apr 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

forrestmurray-db commented Apr 13, 2026

Summary

Changes (63 files, +4839 / -1206)

Auth (12 commits)

Summarization (7 commits)

Lakebase & Database (3 commits)

Docs (4 commits)

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants