This project outlines a modular, cloud-native workflow for automating audio and video media processing using AWS services. It handles media uploads, conversion, metadata extraction, and asset preparation for public delivery.
Note: This is a generic template. You’ll need to customize bucket names, IAM roles, CORS policies, lifecycle rules, file paths, and environment variables to match your environment and security posture.
s3://<watch-bucket>/inputs/: Incoming upload directory (watch folder)s3://<public-bucket>/: Destination for processed media/audiovideo/— Transcoded media files/closed-caption/— VTT caption files/json/— Video metadata/original/— Raw uploaded files/thumbnail/— Splash images (JPG)
- Files are uploaded via a web interface (e.g., using
plupload) to theinputs/folder. - This triggers an AWS Lambda function (e.g.,
MediaLambdaHandler). - Determines file type (audio/video) and checks for multipart CHUNK uploads.
- CHUNK files (e.g.,
.CHUNK0,.CHUNK1) are ignored until fully assembled.
- CHUNK files (e.g.,
Once a complete file is detected:
- A custom-compiled binary (e.g.,
pymediainfo) analyzes the file. - Extracts key metadata: media type, resolution, duration, etc.
- MP3 files are moved directly to the final media bucket (e.g.,
audiovideo/). - No additional processing is required.
- Metadata determines which resolutions to create (e.g., skip 1080p for a 720p source).
- A template job configuration (e.g.,
job.json) is modified in-memory. - AWS MediaConvert is invoked to:
- Create multiple resolution variants (360p, 480p, 720p, 1080p).
- Append resolution to filenames (
filename_1080p.mp4). - Store the files in the designated output folder.
- Video metadata is saved as a
.jsonfile with a matching name in the/json/folder. - A separate Lambda function (e.g.,
SaveVideoFrame) extracts a JPG frame (default at 7 seconds). - The thumbnail is saved to the
/thumbnail/folder.
- Caption files uploaded by users are converted to
.vttif necessary. - They are stored with no additional processing in
/closed-caption/.
Lambda Source Code
MediaLambdaHandler.py— Main Lambda for processing uploadsSaveVideoFrame.py— Lambda for thumbnail extractionjob.json— Template for AWS MediaConvert job settings
Custom Binaries
pymediainfoandffmpeglayershould be compiled and packaged as Lambda layers compatible with your Python runtime (e.g., 3.7).- A good starting guide:
https://binx.io/blog/2017/10/20/how-to-install-python-binaries-in-aws-lambda/
- Environment Variables (set per deployment):
APPLICATION_NAMEDESTINATION_BUCKETMEDIACONVERT_ROLE_ARN
- Basic Settings:
- Runtime: Python 3.7+
- Memory: 128 MB
- Timeout: 2 minutes
- S3 Trigger: Event type
ObjectCreatedwith prefixinputs/
- Important: Thumbnail generation is memory-intensive.
- Settings:
- Runtime: Python 3.7+
- Memory: 3008 MB
- Timeout: 30 seconds
- Trigger: Upload to
/original/folder
- CloudWatch logs are enabled by default; all scripts use
logger.info()for traceability. - An EventBridge (CloudWatch Events) rule can be set up to notify an SNS topic when MediaConvert jobs complete.
- Subscribers can include dev team members, automated systems, or webhooks.
⚠️ You are responsible for defining IAM roles, policies, and bucket configurations.
-
Watchfolder S3 Bucket:
- Should block public access.
- Recommended: Lifecycle rule to delete old uploads after 24 hours.
-
Public Delivery Bucket:
- Should be read-accessible as needed (e.g., via CloudFront or direct links).
- Configure CORS policy to allow cross-origin access as necessary.
- Use Infrastructure-as-Code (e.g., CloudFormation, Terraform, CDK) to manage:
- Lambda functions
- S3 buckets and policies
- MediaConvert roles and permissions
- Set up versioning on buckets if retaining media history is needed.
- Automate unit tests for metadata validation and file integrity.
For local testing:
- Use a tool like localstack to mock AWS services.
- Upload test media files to a local S3 bucket to simulate the full pipeline.
This pipeline offers a flexible blueprint for media processing in AWS. It’s scalable, customizable, and designed for clarity. Just plug in your infrastructure, tweak the knobs, and let the automation do the heavy lifting.
- Replace bucket names (
<watch-bucket>,<public-bucket>) - Replace placeholder ARNs and role names
- Configure lifecycle & CORS policies
- Compile and upload Lambda layers (
pymediainfo,ffmpeglayer) - Setup CloudWatch and SNS alerts for your team