Data Analyst Agent API

An intelligent API that leverages Large Language Models (LLMs) to function as an autonomous data analyst. This agent can source data from the web or uploaded files, prepare and clean it, perform complex analysis and calculations, and generate visualizations on demand.

This project demonstrates an advanced Planner-Executor agent architecture, where an LLM first breaks down a complex task into a structured plan, and then a series of tools execute that plan to achieve the final result.

🚀 Key Features

Multi-Step Task Planning: Dynamically creates and executes multi-step plans to handle complex data analysis requests.
Dynamic Web Scraping: Uses Playwright to render JavaScript-heavy websites and an LLM to intelligently identify and extract the correct data tables.
Code Interpreter: Generates and executes Python code in a sandboxed environment for reliable and precise data cleaning, analysis, and statistical calculations using pandas and scikit-learn.
Dynamic Visualization: Creates plots and charts on the fly using matplotlib and seaborn, returning them as base64 data URIs.
Multi-Source Data Handling: Capable of processing data from web URLs, uploaded files (.csv, .pdf, etc.), and cloud storage (e.g., S3).
API-First Design: Exposes a simple yet powerful API endpoint to receive tasks and return results.

🛠️ Tech Stack & Architecture

This project is built with a modern, robust tech stack designed for building AI-powered applications.

Backend: FastAPI (Python)
LLM Orchestration: Custom Planner-Executor loop with OpenAI's gpt-5-nano
Data Handling: Pandas, NumPy
Web Scraping: Playwright (for dynamic sites), BeautifulSoup4 (for parsing)
Visualization: Matplotlib, Seaborn
Machine Learning: Scikit-learn
Deployment: Docker, Hugging Face Spaces

Architecture: Planner-Executor Model

Request Input: The API receives a natural language task (e.g., "Scrape this URL, join with this CSV, and plot the results") and optional file attachments.
Planner Agent: An LLM call analyzes the request and breaks it down into a structured JSON plan. For example: [{"tool": "scrape_web"}, {"tool": "read_csv"}, {"tool": "run_python_code"}].
Executor Loop: The Python backend iterates through the plan, calling the appropriate tool for each step.
Tool Execution: Each tool (e.g., scrape_web, run_python_code) performs its specific task, storing its results in a shared context.
Code Interpreter: The run_python_code tool asks the LLM to write Python code to perform the final analysis, which is then executed in a secure sandbox.
Response Output: The final result, which can be a JSON array of text, numbers, or base64-encoded images, is returned to the user.

🏁 Getting Started & Usage

The agent is exposed via a single API endpoint. You can interact with it using any HTTP client, like curl.

API Endpoint: https://karthix1-data-analyst-agent.hf.space/api/

Example Usage: Web Scraping, Analysis, and Visualization

This example asks the agent to scrape a Wikipedia page, answer several analytical questions, and generate a plot.

Create a questions.txt file:

Scrape the list of highest grossing films from Wikipedia. It is at the URL:
https://en.wikipedia.org/wiki/List_of_highest-grossing_films

Answer the following questions:
1. How many $2 bn movies were released before 2000?
2. Which is the earliest film that grossed over $1.5 bn?
3. What's the correlation between the Rank and Peak?
4. Draw a scatterplot of Rank and Peak along with a dotted red regression line through it.
   Return as a base-64 encoded data URI.

Send the request using curl:

curl -X POST "https://karthix1-data-analyst-agent.hf.space/api/" \
     -F "questions.txt=@questions.txt"

Expected Response: A JSON array containing the answers to the four questions. The final answer will be a long data URI string representing the generated plot.

[
  "Answer 1: 1",
  "Answer 2: Titanic (1997)",
  "Answer 3: Correlation: 0.5389",
  "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAA... (and so on)"
]

🔧 Running Locally

To run this project on your own machine:

Clone the repository:

git clone https://github.com/Karthix1/data-analyst-agent.git
cd data-analyst-agent

Set up environment variables: Create a .env file in the root directory and add your API keys:

OPENAI_API_KEY="your_openai_or_aipipe_token"
OPENAI_BASE_URL="optional_base_url_if_using_a_proxy"

Build and run with Docker (Recommended): This ensures all dependencies, including Playwright's browsers, are correctly installed.
```
docker build -t data-analyst-agent .
docker run -p 8000:7860 -v $(pwd):/app --env-file .env data-analyst-agent
```
The API will be available at http://localhost:8000.

📜 License

This project is licensed under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
tools.py		tools.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Analyst Agent API

🚀 Key Features

🛠️ Tech Stack & Architecture

Architecture: Planner-Executor Model

🏁 Getting Started & Usage

Example Usage: Web Scraping, Analysis, and Visualization

🔧 Running Locally

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Analyst Agent API

🚀 Key Features

🛠️ Tech Stack & Architecture

Architecture: Planner-Executor Model

🏁 Getting Started & Usage

Example Usage: Web Scraping, Analysis, and Visualization

🔧 Running Locally

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages