Skip to content

anshtrivediaiml/SyntheticDataGenerator

 
 

Repository files navigation

Synthetic Data Generator

A Flask web application that uses CTGAN (Conditional Tabular GAN) to generate synthetic data from uploaded CSV files or manually entered data.

Features

  • Upload CSV files and train CTGAN models
  • Manually enter tabular data for model training
  • Generate synthetic data based on trained models
  • Web-based interface with Bootstrap styling

Local Development

Prerequisites

  • Python 3.9+
  • pip

Setup

  1. Clone the repository:

    git clone https://github.com/anshtrivediaiml/Synthetic-Data-Generator.git
    cd ctgan_flask_app
  2. Create a virtual environment:

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Set environment variables (optional for development):

    export SECRET_KEY="your-secret-key-here"
    export FLASK_DEBUG="True"
  5. Run the application:

    python app.py
  6. Open http://localhost:5000 in your browser.

Production Deployment

Using Railway

  1. Create a Railway account at https://railway.app
  2. Connect your GitHub repository
  3. Railway will automatically detect the Dockerfile and deploy
  4. Set environment variables in Railway dashboard:
    • SECRET_KEY: A secure random string
    • FLASK_DEBUG: False

Using Fly.io

  1. Install Fly CLI: https://fly.io/docs/hands-on/install-flyctl/
  2. Login: fly auth login
  3. Launch: fly launch
  4. Set secrets:
    fly secrets set SECRET_KEY="your-secret-key"
    fly secrets set FLASK_DEBUG="False"
  5. Deploy: fly deploy

Using Heroku

  1. Install Heroku CLI
  2. Create app: heroku create your-app-name
  3. Set buildpack: heroku buildpacks:set heroku/python
  4. Push to Heroku: git push heroku main
  5. Set environment variables:
    heroku config:set SECRET_KEY="your-secret-key"
    heroku config:set FLASK_DEBUG="False"

Using Docker Locally

docker build -t ctgan-app .
docker run -p 8000:7860 ctgan-app

Environment Variables

  • SECRET_KEY: Flask secret key for sessions (required in production)
  • FLASK_DEBUG: Set to True for development, False for production

Health Check

The application provides a health check endpoint at /health that returns {"status": "healthy"}.

File Structure

ctgan_flask_app/
├── app.py                 # Main Flask application
├── requirements.txt       # Python dependencies
├── Dockerfile            # Docker configuration
├── Procfile              # Heroku deployment config
├── .gitignore           # Git ignore rules
├── README.md            # This file
├── static/              # Static files (CSS, images)
├── templates/           # HTML templates
├── uploads/             # Uploaded files (ignored in git)
└── outputs/             # Generated models and data (ignored in git)

Deployment Considerations

Persistent Storage

The application stores trained models and generated data in local directories (uploads/ and outputs/). For production deployments:

  • Railway: Supports persistent disks. Configure a volume for /code/uploads and /code/outputs
  • Fly.io: Supports persistent volumes. Add volumes in fly.toml for data persistence
  • Heroku: Uses ephemeral storage - data will be lost on dyno restarts. Consider using cloud storage like AWS S3
  • Docker: Mount host directories as volumes for persistence

Scaling

For high-traffic deployments, consider:

  • Using a database instead of file storage
  • Implementing caching for model loading
  • Using async processing for model training

Security Notes

  • File uploads are limited to 16MB
  • Sensitive data should not be uploaded to public repositories
  • Use strong SECRET_KEY in production
  • Implement proper authentication if needed

About

My 6th semester AIML and Flask based project which takes sample small amount of data and convert it to realistic huge amount of synthetic data using CTGAN.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 99.7%
  • HTML 0.2%
  • PowerShell 0.1%
  • CSS 0.0%
  • Batchfile 0.0%
  • Dockerfile 0.0%