Database Media Optimization and Compression (DBMOC)

This project connects to a PostgreSQL database, reads media file URLs from defined schemas, downloads images from S3-compatible object storage, compresses and converts them to WebP format, then reuploads them and updates the database records with the new URLs.

Project Structure

src/
├── aws/              # AWS S3 operations (download, upload, compression)
│   ├── __init__.py   # Main AWS module with S3 client and image processing
│   └── aws.py        # Legacy code (can be removed)
├── config/           # Configuration directory
│   └── .env          # Environment variables (create this file)
├── db/               # Database connection and session management
│   ├── __init__.py
│   └── db.py         # Database initialization and session factory
├── schemas/          # SQLAlchemy ORM models
│   ├── __init__.py   # Base model and data fetching utilities
│   ├── events.py     # Event model (cover, poster fields)
│   ├── posts.py      # Post model (media_file field)
│   └── profiles.py   # AccountProfile model (avatar, banner fields)
├── main.py           # Main script that orchestrates the processing
└── pyproject.toml    # Project dependencies

Features

Database Integration: Connects to PostgreSQL database using SQLAlchemy
Schema Support: Processes media files from multiple schemas:
- AccountProfile: avatar and banner images
- Post: media_file images (filtered by media_type='image')
- Event: cover and poster images
S3 Integration: Downloads and uploads files from S3-compatible object storage
Image Compression: Converts images to WebP format with configurable quality
Automatic Updates: Updates database records with new WebP file URLs
Error Handling: Robust error handling with rollback on failures
Statistics: Provides processing statistics for each schema

Installation

Install dependencies using uv (or your preferred package manager):
```
cd src
uv sync
```
Create a .env file in the config/ directory with the following variables:

Environment Variables

Create a config/.env file with the following required variables:

# Database Configuration
DATABASE_URL=postgresql://user:password@localhost:5432/dbname

# AWS S3 Configuration
AWS_STORAGE_BUCKET_NAME=your-bucket-name
AWS_ACCESS_KEY_ID=your-access-key-id
AWS_SECRET_ACCESS_KEY=your-secret-access-key
AWS_S3_ENDPOINT_URL=https://s3.amazonaws.com
AWS_SERVICE_NAME=s3

# Image Processing Configuration
WEBP_QUALITY=80

Environment Variable Descriptions

DATABASE_URL: PostgreSQL connection string
AWS_STORAGE_BUCKET_NAME: Name of your S3 bucket
AWS_ACCESS_KEY_ID: S3 access key ID
AWS_SECRET_ACCESS_KEY: S3 secret access key
AWS_S3_ENDPOINT_URL: S3 endpoint URL (default: https://s3.amazonaws.com)
- For AWS S3: https://s3.amazonaws.com or https://s3.region.amazonaws.com
- For DigitalOcean Spaces: https://region.digitaloceanspaces.com
- For MinIO: http://localhost:9000
AWS_SERVICE_NAME: S3 service name (default: s3)
WEBP_QUALITY: WebP compression quality (0-100, default: 80)
- Lower values = smaller files but lower quality
- Higher values = better quality but larger files

Usage

Run the main script:

cd src
python main.py

The script will:

Connect to the database
Fetch records from all schemas (AccountProfile, Post, Event)
For each media URL:
- Download the image from S3
- Compress and convert to WebP format
- Upload the compressed image to S3
- Update the database record with the new URL
Print processing statistics

How It Works

1. Database Connection

The db/db.py module initializes a SQLAlchemy engine and session factory
Uses connection pooling for efficient database access
Loads database URL from config/.env

2. Schema Data Fetching

The schemas/__init__.py module provides a fetch_db_data() function
Fetches records with specific fields and optional filter conditions
Supports loading only required fields for efficiency

3. Image Processing Pipeline

URL Extraction: Extracts S3 key from full URL (supports multiple URL formats)
Download: Downloads image from S3 as bytes
Compression: Converts image to WebP format with configurable quality
Upload: Uploads compressed image to S3
Database Update: Updates database record with new URL

4. S3 URL Formats Supported

Virtual-hosted-style: https://bucket.s3.amazonaws.com/key
Path-style: https://s3.amazonaws.com/bucket/key
Custom endpoints: https://endpoint/bucket/key or https://bucket.endpoint/key

Database Schemas

AccountProfile (`accounts_profile`)

id: Integer (primary key)
avatar: String(255) - Avatar image URL
banner: String(255) - Banner image URL

Post (`post_postmedia`)

id: Integer (primary key)
media_type: String(20) - Media type (filtered for 'image')
media_file: String(255) - Media file URL

Event (`event_event`)

id: Integer (primary key)
cover: String(255) - Cover image URL
poster: String(255) - Poster image URL

Image Processing

WebP Compression

Preserves transparency when present (RGBA mode)
Converts images to RGB for non-transparent images
Configurable quality setting (0-100)
Uses method 6 (best compression) for WebP encoding

File Replacement Strategy

By default, replaces original file with .webp extension
Original file extension is removed
If replace_original=False, creates a new file with .webp appended

Error Handling

Database errors: Rolls back transaction on failure
S3 errors: Logs error and continues with next file
Image processing errors: Logs error and skips file
Statistics track processed, failed, and skipped files

Statistics

The script provides statistics for each schema:

processed: Number of successfully processed files
failed: Number of files that failed to process
skipped: Number of files that were skipped (e.g., already WebP, no URL)

Dependencies

boto3: AWS S3 client
sqlalchemy: Database ORM
psycopg2: PostgreSQL adapter
pillow: Image processing
python-dotenv: Environment variable management

Notes

The script processes files sequentially (not in parallel) to avoid overwhelming the database and S3 service
Each file is committed to the database immediately after processing
Files that are already in WebP format are skipped
The script handles various image formats (JPEG, PNG, GIF, BMP, TIFF, etc.)

Troubleshooting

Database Connection Issues

Verify DATABASE_URL is correct
Check database server is running
Verify network connectivity

S3 Connection Issues

Verify AWS credentials are correct
Check AWS_S3_ENDPOINT_URL matches your S3 provider
Verify bucket name is correct
Check network connectivity to S3 endpoint

Image Processing Issues

Verify image files are valid
Check file permissions
Verify Pillow supports the image format

URL Parsing Issues

Check URL format matches supported formats
Verify bucket name in URL matches AWS_STORAGE_BUCKET_NAME
For custom endpoints, ensure URL structure is correct

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vscode		.vscode
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Database Media Optimization and Compression (DBMOC)

Project Structure

Features

Installation

Environment Variables

Environment Variable Descriptions

Usage

How It Works

1. Database Connection

2. Schema Data Fetching

3. Image Processing Pipeline

4. S3 URL Formats Supported

Database Schemas

AccountProfile (`accounts_profile`)

Post (`post_postmedia`)

Event (`event_event`)

Image Processing

WebP Compression

File Replacement Strategy

Error Handling

Statistics

Dependencies

Notes

Troubleshooting

Database Connection Issues

S3 Connection Issues

Image Processing Issues

URL Parsing Issues

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Database Media Optimization and Compression (DBMOC)

Project Structure

Features

Installation

Environment Variables

Environment Variable Descriptions

Usage

How It Works

1. Database Connection

2. Schema Data Fetching

3. Image Processing Pipeline

4. S3 URL Formats Supported

Database Schemas

AccountProfile (accounts_profile)

Post (post_postmedia)

Event (event_event)

Image Processing

WebP Compression

File Replacement Strategy

Error Handling

Statistics

Dependencies

Notes

Troubleshooting

Database Connection Issues

S3 Connection Issues

Image Processing Issues

URL Parsing Issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

AccountProfile (`accounts_profile`)

Post (`post_postmedia`)

Event (`event_event`)

Packages