Scene-Based Video Frame Extractor (AWS Lambda)

A Python-based AWS Lambda function that extracts key frames from videos using scene detection technology. The service automatically processes videos stored in S3, detects scene changes, and extracts representative frames from each scene.

Overview

This service is designed to:

Process videos from S3 buckets
Detect scene changes using PySceneDetect
Extract representative frames from each scene
Upload frames to S3
Track processing status in Supabase

Architecture

Runtime: Python on AWS Lambda
Storage: AWS S3 for videos and frames
Database: Supabase for status tracking
Queue: SQS for processing requests
Container: Deployed as a Docker container on ECR

Environment Configuration

Create a .env file based on .env.sample. The configuration is organized into several sections:

S3 Settings

S3_BUCKET_VIDEOS=oriane-contents    # Bucket for source videos
S3_BUCKET_FRAMES=oriane-contents    # Bucket for extracted frames

Supabase Configuration

SUPABASE_URL=https://your-project-ref.supabase.co
SUPABASE_KEY=your-service-role-key

Extraction Settings

MIN_FRAMES=3          # Minimum frames per video if scenes < MIN_FRAMES
SCENE_THRESHOLD=27.0  # PySceneDetect content-detector threshold

Performance Tuning

CONCURRENCY_LIMIT=2   # How many videos to process in parallel
MIN_REMAINING_MS=60000  # Lambda timeout buffer (ms)
S3_RETRIES=3         # Number of retries for S3 operations
S3_RETRY_DELAY=1000  # Delay between retries (ms)
S3_CONNECT_TIMEOUT=10 # S3 connection timeout (seconds)
S3_READ_TIMEOUT=60   # S3 read timeout (seconds)

Debug Settings

DEBUG=true           # Enable debug-level logging

Core Features

Scene-based frame extraction using content detection
Fallback to evenly-spaced frames if minimum scene count not met
Parallel frame upload to S3
Comprehensive error handling and logging
Configurable thresholds and limits

Input Format

The Lambda function expects SQS messages in the following format:

{
    "shortcode": "VIDEO_ID",
    "platform": "PLATFORM_NAME"
}

Lambda Function Workflow

The Lambda function processes videos through several distinct steps:

1. Record Processing Initialization

Receives SQS event containing video records
Parses the record body to extract shortcode and platform
Sets up logging context with the video's shortcode
Checks remaining Lambda execution time to prevent timeouts

2. Video Status Verification

Queries Supabase to check video status:
- is_downloaded: Confirms video exists in S3
- is_extracted: Prevents duplicate processing
Skips processing if video isn't downloaded or already extracted
Records status in extraction_errors table if needed

3. Video Download

Constructs S3 key path: {platform}/{code}/video.mp4
Downloads video to temporary local storage
Implements retry logic with configurable attempts
Verifies download success before proceeding

4. Frame Extraction

Detects scene changes using PySceneDetect:
- Uses ContentDetector with configurable threshold
- Identifies major visual transitions
Selects representative frames:
- Takes middle frame from each detected scene
- Falls back to evenly-spaced frames if scene count < MIN_FRAMES
Saves frames as JPEG files in temporary directory

5. Frame Upload

Uploads extracted frames to S3:
- Destination path: {platform}/{code}/frames/{index}.jpg
- Uses parallel upload with ThreadPoolExecutor
- Implements retry logic for failed uploads
Cleans up temporary files after upload

6. Status Updates

Updates Supabase on successful extraction:
- Sets is_extracted = true
- Records the number of extracted frames
Records any errors in extraction_errors table
Cleans up all temporary files and resources

7. Concurrency Handling

Processes multiple videos in parallel
Respects CONCURRENCY_LIMIT setting
Aggregates results from all processed videos
Returns comprehensive status report

Testing

A test suite (test.py) is provided for local testing and validation:

# Run all tests
python3 test.py

The test suite includes:

Bulk extraction testing
Single video processing
Debug mode for detailed logging

Deployment

Use the provided push_image.sh script to build and deploy to AWS ECR:

# Build and deploy Docker image
./push_image.sh

The script handles:

ECR repository creation
Docker image building
Authentication
Image pushing with timeout handling

Error Handling

The service includes comprehensive error handling for:

Missing or corrupted videos
Scene detection failures
S3 upload/download issues
Database connection problems
Lambda timeout prevention

Dependencies

Main Python packages:

opencv-python: Video frame extraction
scenedetect: Scene change detection
boto3: AWS SDK
supabase: Database client

Logging

The service provides detailed logging with:

Configurable debug mode
Shortcode-based context
Operation tracking
Error reporting

Setup Instructions

Clone the repository
Copy .env.sample to .env
Update the environment variables in .env with your credentials
Install dependencies
Run tests to verify setup
Deploy using the deployment script

Notes

The service requires appropriate AWS IAM permissions for S3 and Lambda
Supabase credentials must be configured before deployment
Consider Lambda timeout limits when processing longer videos
Monitor S3 costs for video and frame storage
Keep your .env file secure and never commit it to version control

Contributing

Ensure all tests pass locally
Maintain the existing error handling patterns
Update documentation for any new features
Follow the established logging conventions
Update .env.sample if adding new configuration options

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
scripts		scripts
tests		tests
.env.sample		.env.sample
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
create_venv.sh		create_venv.sh
lambda_function.py		lambda_function.py
lambda_function2.py		lambda_function2.py
push_image.sh		push_image.sh
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scene-Based Video Frame Extractor (AWS Lambda)

Overview

Architecture

Environment Configuration

S3 Settings

Supabase Configuration

Extraction Settings

Performance Tuning

Debug Settings

Core Features

Input Format

Lambda Function Workflow

1. Record Processing Initialization

2. Video Status Verification

3. Video Download

4. Frame Extraction

5. Frame Upload

6. Status Updates

7. Concurrency Handling

Testing

Deployment

Error Handling

Dependencies

Logging

Setup Instructions

Notes

Contributing

About

Uh oh!

Releases

Packages

Languages

orianexxx/VideoFramesSceneExtractor-lambda

Folders and files

Latest commit

History

Repository files navigation

Scene-Based Video Frame Extractor (AWS Lambda)

Overview

Architecture

Environment Configuration

S3 Settings

Supabase Configuration

Extraction Settings

Performance Tuning

Debug Settings

Core Features

Input Format

Lambda Function Workflow

1. Record Processing Initialization

2. Video Status Verification

3. Video Download

4. Frame Extraction

5. Frame Upload

6. Status Updates

7. Concurrency Handling

Testing

Deployment

Error Handling

Dependencies

Logging

Setup Instructions

Notes

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages