A Python-based AWS Lambda function that extracts key frames from videos using scene detection technology. The service automatically processes videos stored in S3, detects scene changes, and extracts representative frames from each scene.
This service is designed to:
- Process videos from S3 buckets
- Detect scene changes using PySceneDetect
- Extract representative frames from each scene
- Upload frames to S3
- Track processing status in Supabase
- Runtime: Python on AWS Lambda
- Storage: AWS S3 for videos and frames
- Database: Supabase for status tracking
- Queue: SQS for processing requests
- Container: Deployed as a Docker container on ECR
Create a .env file based on .env.sample. The configuration is organized into several sections:
S3_BUCKET_VIDEOS=oriane-contents # Bucket for source videos
S3_BUCKET_FRAMES=oriane-contents # Bucket for extracted framesSUPABASE_URL=https://your-project-ref.supabase.co
SUPABASE_KEY=your-service-role-keyMIN_FRAMES=3 # Minimum frames per video if scenes < MIN_FRAMES
SCENE_THRESHOLD=27.0 # PySceneDetect content-detector thresholdCONCURRENCY_LIMIT=2 # How many videos to process in parallel
MIN_REMAINING_MS=60000 # Lambda timeout buffer (ms)
S3_RETRIES=3 # Number of retries for S3 operations
S3_RETRY_DELAY=1000 # Delay between retries (ms)
S3_CONNECT_TIMEOUT=10 # S3 connection timeout (seconds)
S3_READ_TIMEOUT=60 # S3 read timeout (seconds)DEBUG=true # Enable debug-level logging- Scene-based frame extraction using content detection
- Fallback to evenly-spaced frames if minimum scene count not met
- Parallel frame upload to S3
- Comprehensive error handling and logging
- Configurable thresholds and limits
The Lambda function expects SQS messages in the following format:
{
"shortcode": "VIDEO_ID",
"platform": "PLATFORM_NAME"
}The Lambda function processes videos through several distinct steps:
- Receives SQS event containing video records
- Parses the record body to extract
shortcodeandplatform - Sets up logging context with the video's shortcode
- Checks remaining Lambda execution time to prevent timeouts
- Queries Supabase to check video status:
is_downloaded: Confirms video exists in S3is_extracted: Prevents duplicate processing
- Skips processing if video isn't downloaded or already extracted
- Records status in extraction_errors table if needed
- Constructs S3 key path:
{platform}/{code}/video.mp4 - Downloads video to temporary local storage
- Implements retry logic with configurable attempts
- Verifies download success before proceeding
- Detects scene changes using PySceneDetect:
- Uses ContentDetector with configurable threshold
- Identifies major visual transitions
- Selects representative frames:
- Takes middle frame from each detected scene
- Falls back to evenly-spaced frames if scene count < MIN_FRAMES
- Saves frames as JPEG files in temporary directory
- Uploads extracted frames to S3:
- Destination path:
{platform}/{code}/frames/{index}.jpg - Uses parallel upload with ThreadPoolExecutor
- Implements retry logic for failed uploads
- Destination path:
- Cleans up temporary files after upload
- Updates Supabase on successful extraction:
- Sets
is_extracted = true - Records the number of extracted frames
- Sets
- Records any errors in extraction_errors table
- Cleans up all temporary files and resources
- Processes multiple videos in parallel
- Respects CONCURRENCY_LIMIT setting
- Aggregates results from all processed videos
- Returns comprehensive status report
A test suite (test.py) is provided for local testing and validation:
# Run all tests
python3 test.pyThe test suite includes:
- Bulk extraction testing
- Single video processing
- Debug mode for detailed logging
Use the provided push_image.sh script to build and deploy to AWS ECR:
# Build and deploy Docker image
./push_image.shThe script handles:
- ECR repository creation
- Docker image building
- Authentication
- Image pushing with timeout handling
The service includes comprehensive error handling for:
- Missing or corrupted videos
- Scene detection failures
- S3 upload/download issues
- Database connection problems
- Lambda timeout prevention
Main Python packages:
opencv-python: Video frame extractionscenedetect: Scene change detectionboto3: AWS SDKsupabase: Database client
The service provides detailed logging with:
- Configurable debug mode
- Shortcode-based context
- Operation tracking
- Error reporting
- Clone the repository
- Copy
.env.sampleto.env - Update the environment variables in
.envwith your credentials - Install dependencies
- Run tests to verify setup
- Deploy using the deployment script
- The service requires appropriate AWS IAM permissions for S3 and Lambda
- Supabase credentials must be configured before deployment
- Consider Lambda timeout limits when processing longer videos
- Monitor S3 costs for video and frame storage
- Keep your
.envfile secure and never commit it to version control
- Ensure all tests pass locally
- Maintain the existing error handling patterns
- Update documentation for any new features
- Follow the established logging conventions
- Update
.env.sampleif adding new configuration options