Retrieval Augmented Generation (RAG) systems operate by retrieving content related to an input query, incorporating the retrieved content into a prompt, then using an LLM generate a final response relative to the user's query and retrieved content. Having high-quality content retrieval is a critical piece toward improving the overall quality of any RAG system.
You.com’s developer-friendly APIs deliver real-time, low-latency web and news data, enabling developers to build applications that seamlessly integrate real-time public data from across the entire web at enterprise scale. Whether you’re enhancing large language models (LLMs) or building custom agents, anyone can have access to the entire web with just a few lines of code.
In this solution accelerator, we show how Databricks and You.com makes it possible to:
-
Access real-time knowledge from the web and news
-
Empower AI agents with fresh context for better responses
-
Evaluate agent performances using Databricks-native MLflow tools.
Reference Blog: Unlocking Real-Time Intelligence for AI Agents with You.com and Databricks Blog
Before running this demo, you'll need:
- A Databricks workspace with Unity Catalog enabled
- Databricks CLI installed and configured (
databricks --version>= 0.200.0) - A You.com API key (Sign up here)
- Access to a Databricks model serving endpoint (default:
databricks-claude-3-7-sonnet) - Permissions to create UC functions, secrets, jobs, and MLflow experiments
This project uses Databricks Asset Bundles (DAB) for streamlined deployment and lifecycle management.
-
Clone this repository
git clone https://github.com/databricks-solutions/realtime-rag-agents-databricks-youcom.git cd realtime-rag-agents-databricks-youcom -
Install/upgrade Databricks CLI
pip install --upgrade databricks-cli databricks --version # Should be >= 0.200.0 -
Authenticate with your Databricks workspace
databricks auth login --host https://your-workspace.cloud.databricks.com
-
Store your You.com API key in Databricks Secrets
databricks secrets create-scope demo_secrets databricks secrets put --scope demo_secrets --key you_com_api_key --string-value "YOUR_API_KEY" -
Deploy the bundle
# Deploy to dev environment (default) databricks bundle deploy -t dev # Or deploy to production databricks bundle deploy -t prod
-
Run the pipeline
# Run the full agent pipeline databricks bundle run realtime_rag_agent_pipeline -t dev # Or run evaluation only databricks bundle run evaluate_agent_only -t dev
You can customize the deployment by setting variables in the bundle. Create a databricks.yml override or use the CLI:
# Example: Deploy with custom catalog and schema
databricks bundle deploy -t dev \
--var catalog=my_catalog \
--var schema=my_schema \
--var secret_scope=my_secretsAvailable variables:
catalog: Unity Catalog name (default:main)schema: Schema name (default:realtime_rag_demo_devfor dev,realtime_rag_demo_prodfor prod)secret_scope: Secret scope containing You.com API key (required)secret_key: Secret key name (default:you_com_api_key)cluster_node_type: Cluster node type (default:i3.xlarge)llm_endpoint_name: LLM endpoint name (default:databricks-claude-3-7-sonnet)
If you prefer to run notebooks manually instead of using the asset bundle:
- Set up secrets (same as step 4 above)
- Run notebooks in sequence - Use the widget UI at the top of each notebook to set parameters, or create
src/common.yamlwith your configuration values - See SETUP.md for configuration details
Execute these notebooks in order:
01-agent-configs: Establishes a You.com Connection via a Unity Catalog Function.02-define-agent: Defines the agentic logic that integrates the UC function with a base LLM.03-create-agent: Creates and tests the agent using MLflow tracing.04-evaluate-agent: Evaluates agent performance using multiple scorers.
This demo showcases real-time information retrieval. Try queries like:
- "What is the current status of United Airlines flight UA2749?"
- "What are the current delays at JFK airport?"
- "What are the latest developments in today's tech news?"
- "What is the current stock price of NVIDIA?"
Databricks support doesn't cover this content. For questions or bugs, please open a GitHub issue and the team will help on a best effort basis.
© 2025 Databricks, Inc. All rights reserved. The source in this notebook is provided subject to the Databricks License [https://databricks.com/db-license-source]. All included or referenced third party libraries are subject to the licenses set forth below.
| library | description | license | source |
|---|---|---|---|
| mlflow-skinny | MLflow tracking and model management for AI applications | Apache 2.0 | https://github.com/mlflow/mlflow |
| langgraph | Framework for building stateful, multi-actor applications with LLMs | MIT | https://github.com/langchain-ai/langgraph |
| databricks-langchain | Databricks integrations for LangChain framework | Apache 2.0 | https://github.com/databricks/databricks-langchain |
| databricks-agents | Databricks Agent Framework for building AI agents | Apache 2.0 | https://docs.databricks.com/generative-ai/agent-framework/ |
