DataFlow Architect is a Streamlit application that leverages AI to automatically generate comprehensive data science workflows and reports. Upload your dataset or document, choose your desired mode, and let the AI guide you through creating a full report or step-by-step workflow. View the app here: https://dataflow-architect-chpwn2xcy4yyarx8azsjct.streamlit.app/
- Features
- Requirements
- Installation
- Setting Up Environment Variables
- Running the Application
- Usage
- License
-
Comprehensive Report Generation:
Generate a full, AI-assisted report detailing data preprocessing, exploratory analysis, machine learning suggestions, and more. -
Step-by-Step Guidance:
Receive targeted, step-by-step instructions for specific workflow sections. -
PDF Export:
Download the generated report or guidance as a PDF. -
Interactive & User-Friendly:
Easily upload your dataset or document, select your mode, and follow clear prompts.
The application requires the following Python libraries:
streamlit
requests
pandas
markdown
WeasyPrint
python-dotenv
openai
mistralai
A complete requirements.txt file is included in the repository. You also need WeasyPrint and its dependencies (e.g., cairo, pango, libffi) installed on your system.
-
Clone the Repository:
git clone https://github.com/<your-username>/DataFlow-Architect.git cd DataFlow-Architect
-
Create a Virtual Environment (recommended):
python -m venv venv source venv/bin/activate # On macOS/Linux # or on Windows: venv\\Scripts\\activate
-
Install the Required Libraries:
pip install -r requirements.txt
-
Install WeasyPrint Dependencies:
- On macOS with Homebrew:
brew install cairo pango gdk-pixbuf libffi
- On Ubuntu/Debian:
sudo apt-get update sudo apt-get install -y libcairo2 libpango-1.0-0 libffi-dev
- Or via conda:
conda install -c conda-forge weasyprint
- On macOS with Homebrew:
This application uses OpenAI and Mistral APIs. Store your API keys in a local .env file without committing it to version control.
-
Create a
.envfile in the project root:touch .env
-
Add your API keys in the following format:
OPENAI_API_KEY=sk-<your_openai_api_key> MISTRAL_API_KEY=mk-<your_mistralai_api_key>
-
Ensure
.envis ignored by adding it to your.gitignore.
-
Activate your virtual environment (if using one).
-
Run the Streamlit app:
streamlit run Data_flow.py
-
Open the app in your browser (typically at http://localhost:8501).
-
Select Your Mode:
- Report Mode:
Upload a CSV file and click "Generate My Report" to produce a comprehensive AI-assisted report. Then switch to the Report tab to view and download the report. - Step-by-Step Guidance:
Upload your dataset or document, select a workflow section using the clickable cards, and click "Proceed with Selection" to generate step-by-step guidance. Then switch to the Workflow tab to view and download the combined output.
- Report Mode:
-
Download Options:
Both modes allow you to download the generated output as a PDF.
Happy DataFlowing!