This project connects to a PostgreSQL database, reads media file URLs from defined schemas, downloads images from S3-compatible object storage, compresses and converts them to WebP format, then reuploads them and updates the database records with the new URLs.
src/
├── aws/ # AWS S3 operations (download, upload, compression)
│ ├── __init__.py # Main AWS module with S3 client and image processing
│ └── aws.py # Legacy code (can be removed)
├── config/ # Configuration directory
│ └── .env # Environment variables (create this file)
├── db/ # Database connection and session management
│ ├── __init__.py
│ └── db.py # Database initialization and session factory
├── schemas/ # SQLAlchemy ORM models
│ ├── __init__.py # Base model and data fetching utilities
│ ├── events.py # Event model (cover, poster fields)
│ ├── posts.py # Post model (media_file field)
│ └── profiles.py # AccountProfile model (avatar, banner fields)
├── main.py # Main script that orchestrates the processing
└── pyproject.toml # Project dependencies
- Database Integration: Connects to PostgreSQL database using SQLAlchemy
- Schema Support: Processes media files from multiple schemas:
AccountProfile: avatar and banner imagesPost: media_file images (filtered by media_type='image')Event: cover and poster images
- S3 Integration: Downloads and uploads files from S3-compatible object storage
- Image Compression: Converts images to WebP format with configurable quality
- Automatic Updates: Updates database records with new WebP file URLs
- Error Handling: Robust error handling with rollback on failures
- Statistics: Provides processing statistics for each schema
-
Install dependencies using
uv(or your preferred package manager):cd src uv sync -
Create a
.envfile in theconfig/directory with the following variables:
Create a config/.env file with the following required variables:
# Database Configuration
DATABASE_URL=postgresql://user:password@localhost:5432/dbname
# AWS S3 Configuration
AWS_STORAGE_BUCKET_NAME=your-bucket-name
AWS_ACCESS_KEY_ID=your-access-key-id
AWS_SECRET_ACCESS_KEY=your-secret-access-key
AWS_S3_ENDPOINT_URL=https://s3.amazonaws.com
AWS_SERVICE_NAME=s3
# Image Processing Configuration
WEBP_QUALITY=80DATABASE_URL: PostgreSQL connection stringAWS_STORAGE_BUCKET_NAME: Name of your S3 bucketAWS_ACCESS_KEY_ID: S3 access key IDAWS_SECRET_ACCESS_KEY: S3 secret access keyAWS_S3_ENDPOINT_URL: S3 endpoint URL (default: https://s3.amazonaws.com)- For AWS S3:
https://s3.amazonaws.comorhttps://s3.region.amazonaws.com - For DigitalOcean Spaces:
https://region.digitaloceanspaces.com - For MinIO:
http://localhost:9000
- For AWS S3:
AWS_SERVICE_NAME: S3 service name (default:s3)WEBP_QUALITY: WebP compression quality (0-100, default: 80)- Lower values = smaller files but lower quality
- Higher values = better quality but larger files
Run the main script:
cd src
python main.pyThe script will:
- Connect to the database
- Fetch records from all schemas (AccountProfile, Post, Event)
- For each media URL:
- Download the image from S3
- Compress and convert to WebP format
- Upload the compressed image to S3
- Update the database record with the new URL
- Print processing statistics
- The
db/db.pymodule initializes a SQLAlchemy engine and session factory - Uses connection pooling for efficient database access
- Loads database URL from
config/.env
- The
schemas/__init__.pymodule provides afetch_db_data()function - Fetches records with specific fields and optional filter conditions
- Supports loading only required fields for efficiency
- URL Extraction: Extracts S3 key from full URL (supports multiple URL formats)
- Download: Downloads image from S3 as bytes
- Compression: Converts image to WebP format with configurable quality
- Upload: Uploads compressed image to S3
- Database Update: Updates database record with new URL
- Virtual-hosted-style:
https://bucket.s3.amazonaws.com/key - Path-style:
https://s3.amazonaws.com/bucket/key - Custom endpoints:
https://endpoint/bucket/keyorhttps://bucket.endpoint/key
id: Integer (primary key)avatar: String(255) - Avatar image URLbanner: String(255) - Banner image URL
id: Integer (primary key)media_type: String(20) - Media type (filtered for 'image')media_file: String(255) - Media file URL
id: Integer (primary key)cover: String(255) - Cover image URLposter: String(255) - Poster image URL
- Preserves transparency when present (RGBA mode)
- Converts images to RGB for non-transparent images
- Configurable quality setting (0-100)
- Uses method 6 (best compression) for WebP encoding
- By default, replaces original file with
.webpextension - Original file extension is removed
- If
replace_original=False, creates a new file with.webpappended
- Database errors: Rolls back transaction on failure
- S3 errors: Logs error and continues with next file
- Image processing errors: Logs error and skips file
- Statistics track processed, failed, and skipped files
The script provides statistics for each schema:
processed: Number of successfully processed filesfailed: Number of files that failed to processskipped: Number of files that were skipped (e.g., already WebP, no URL)
boto3: AWS S3 clientsqlalchemy: Database ORMpsycopg2: PostgreSQL adapterpillow: Image processingpython-dotenv: Environment variable management
- The script processes files sequentially (not in parallel) to avoid overwhelming the database and S3 service
- Each file is committed to the database immediately after processing
- Files that are already in WebP format are skipped
- The script handles various image formats (JPEG, PNG, GIF, BMP, TIFF, etc.)
- Verify
DATABASE_URLis correct - Check database server is running
- Verify network connectivity
- Verify AWS credentials are correct
- Check
AWS_S3_ENDPOINT_URLmatches your S3 provider - Verify bucket name is correct
- Check network connectivity to S3 endpoint
- Verify image files are valid
- Check file permissions
- Verify Pillow supports the image format
- Check URL format matches supported formats
- Verify bucket name in URL matches
AWS_STORAGE_BUCKET_NAME - For custom endpoints, ensure URL structure is correct