Self-Improving-LLM-Agent

Overview

This project is a web-based tool that helps users extract structured data from PDF documents based on an Excel template. Users can upload a PDF file and an Excel file specifying the desired fields and extraction instructions. The application uses a language model to map information from the PDF to the Excel structure, presents the results for review, and lets users leave feedback. Submitted feedback is used to generate improved prompts and rerun extraction, with both original and updated results shown for comparison.

Features

Upload a PDF and Excel file together to start an extraction session.
Automatically extracts data from the PDF based on column names and instructions in the Excel file.
Shows the extracted data in a readable format for validation.
Allows users to submit feedback or corrections for specific extracted fields.
Uses a secondary agent to improve the extraction prompt based on feedback and reruns the extraction.
Displays both the original and improved outputs side by side.

Tech Stack

Python: Entire backend logic and integrations.
Streamlit: User interface for uploading files, viewing results, and submitting feedback.
Google ADK: Agent development and orchestration.
GPT-5: For extraction and prompt refinement via Google ADK.
Azure Document Intelligence: Reads text from PDF files.
openpyxl: Reads Excel schemas and instructions.

How to Run

Clone the repository and navigate to the project folder.
Create and activate environment:
python -m venv env
env\Scripts\activate
Install required packages:
pip install -r requirements.txt
Add your Azure Document Intelligence, GPT-5 keys/configuration, and db connection string to a .env file (see .env.example).
Run the app:
streamlit run main.py
Upload a PDF and Excel file using the sidebar form.
Review and correct the extracted information. When you submit feedback, the tool will show both the original extraction and the improved result (after incorporating your feedback).

Example Workflow

Upload an invoice PDF and an Excel sheet containing column names like Company Name, Date, Total and row 2 containing extraction instructions if any.
View extracted data in Table/JSON format.
If any field is incorrect or incomplete, enter feedback (e.g., “Total value missing currency symbol”).
The prompt improvement agent will rewrite the extraction instructions using your feedback and re-run the extraction, showing both versions for easy comparison.

Folder Structure

├── main.py # Main Streamlit app
├── extraction_agent.py # Extraction agent code (LLM + ADK)
├── improvement_agent.py # Prompt improvement agent code
├── document_intelligence.py # Azure Document Intelligence wrapper: reads PDF, extracts text/blocks/metadata and returns structured page content for the extraction agent
├── database.py # Persistence layer: extracted results, feedback, and improved prompts; simple CRUD helpers for the app
├── requirements.txt # Python dependencies
├── .env # API keys and config (not committed)

With ❤️ by Team ByteHeads07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Self-Improving-LLM-Agent

Overview

Features

Tech Stack

How to Run

Example Workflow

Folder Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
db_scripts		db_scripts
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
database.py		database.py
document_intelligence.py		document_intelligence.py
extraction_agent.py		extraction_agent.py
improvement_agent.py		improvement_agent.py
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Self-Improving-LLM-Agent

Overview

Features

Tech Stack

How to Run

Example Workflow

Folder Structure

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages