A Flask web application that uses CTGAN (Conditional Tabular GAN) to generate synthetic data from uploaded CSV files or manually entered data.
- Upload CSV files and train CTGAN models
- Manually enter tabular data for model training
- Generate synthetic data based on trained models
- Web-based interface with Bootstrap styling
- Python 3.9+
- pip
-
Clone the repository:
git clone https://github.com/anshtrivediaiml/Synthetic-Data-Generator.git cd ctgan_flask_app -
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set environment variables (optional for development):
export SECRET_KEY="your-secret-key-here" export FLASK_DEBUG="True"
-
Run the application:
python app.py
-
Open http://localhost:5000 in your browser.
- Create a Railway account at https://railway.app
- Connect your GitHub repository
- Railway will automatically detect the Dockerfile and deploy
- Set environment variables in Railway dashboard:
SECRET_KEY: A secure random stringFLASK_DEBUG:False
- Install Fly CLI: https://fly.io/docs/hands-on/install-flyctl/
- Login:
fly auth login - Launch:
fly launch - Set secrets:
fly secrets set SECRET_KEY="your-secret-key" fly secrets set FLASK_DEBUG="False"
- Deploy:
fly deploy
- Install Heroku CLI
- Create app:
heroku create your-app-name - Set buildpack:
heroku buildpacks:set heroku/python - Push to Heroku:
git push heroku main - Set environment variables:
heroku config:set SECRET_KEY="your-secret-key" heroku config:set FLASK_DEBUG="False"
docker build -t ctgan-app .
docker run -p 8000:7860 ctgan-appSECRET_KEY: Flask secret key for sessions (required in production)FLASK_DEBUG: Set toTruefor development,Falsefor production
The application provides a health check endpoint at /health that returns {"status": "healthy"}.
ctgan_flask_app/
├── app.py # Main Flask application
├── requirements.txt # Python dependencies
├── Dockerfile # Docker configuration
├── Procfile # Heroku deployment config
├── .gitignore # Git ignore rules
├── README.md # This file
├── static/ # Static files (CSS, images)
├── templates/ # HTML templates
├── uploads/ # Uploaded files (ignored in git)
└── outputs/ # Generated models and data (ignored in git)
The application stores trained models and generated data in local directories (uploads/ and outputs/). For production deployments:
- Railway: Supports persistent disks. Configure a volume for
/code/uploadsand/code/outputs - Fly.io: Supports persistent volumes. Add volumes in
fly.tomlfor data persistence - Heroku: Uses ephemeral storage - data will be lost on dyno restarts. Consider using cloud storage like AWS S3
- Docker: Mount host directories as volumes for persistence
For high-traffic deployments, consider:
- Using a database instead of file storage
- Implementing caching for model loading
- Using async processing for model training
- File uploads are limited to 16MB
- Sensitive data should not be uploaded to public repositories
- Use strong SECRET_KEY in production
- Implement proper authentication if needed