Serverless Telegram bot that generates 10-second videos from text or voice instructions using a multi-agent AI pipeline with human-in-the-loop script approval.
- User sends text or voice message with video instructions
- AI generates a detailed video script (Claude Sonnet 4.5)
- AI refines the script for optimal video generation (Claude Sonnet 4.5)
- User approves the script and selects a video model (human-in-the-loop)
- AI animates the video directly from the script using kie.ai (Sora2 Pro or Kling)
- User receives the completed 10-second video
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TELEGRAM BOT WORKFLOW β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
User Message (Text/Voice)
β
[Webhook Lambda] βββ Start Step Functions
β
ββββββββββββββββββββββββββββββββββββββββββ
β STEP FUNCTIONS ORCHESTRATION β
ββββββββββββββββββββββββββββββββββββββββββ
β
Voice? ββYesβββ [Transcribe Lambda]
β (AWS Transcribe)
No
β
[Scripter Agent] βββ Claude Sonnet 4.5
β (Generate video script)
[Verifier Agent] βββ Claude Sonnet 4.5
β (Refine script)
[Send Script] βββββββ User receives script preview
β
βΈοΈ PAUSE: Wait for user approval
β
User clicks: β
Approve or βοΈ Refine
β
[Callback Handler] βββ Process approval/refinement
β
[Image Generator] βββ Gemini 2.5 Flash Image
β (Create scene prompts)
[Video Animator] ββββ kie.ai API
β
(Start generation)
β
[kie-callback] ββββββ Callback from kie.ai
β
[Send Video] ββββββββ User receives video
| Component | Technology | Purpose |
|---|---|---|
| Infrastructure | Terraform + AWS | Serverless deployment |
| Orchestration | Step Functions | Multi-agent workflow with human-in-the-loop |
| Compute | Lambda (Node.js 22) | 8 serverless functions |
| Storage | S3 (7-day lifecycle) | Audio/video files |
| Database | DynamoDB | Job tracking & state |
| AI Models | OpenRouter | Unified API for Claude |
| Script Generation | Claude Sonnet 4.5 | Script writing & refinement |
| Video Generation | kie.ai (Sora2/Kling) | 10-second video animation |
| Voice Transcription | AWS Transcribe | Speech-to-text |
| Messaging | Telegram Bot API | User interface |
content-machine/
βββ src/
β βββ config/
β β βββ prompts.mjs # π― Centralized AI prompts
β βββ lib/
β β βββ secrets.mjs # AWS Secrets Manager
β β βββ telegram.mjs # Telegram API client
β β βββ dynamodb.mjs # DynamoDB helpers
β β βββ openrouter.mjs # OpenRouter API (Claude, Gemini)
β β βββ kie.mjs # kie.ai video generation
β βββ webhook/index.mjs # Telegram webhook & callback handler
β βββ transcribe/index.mjs # Voice β text (AWS Transcribe)
β βββ scripter/index.mjs # Script generation (Claude)
β βββ verifier/index.mjs # Script refinement (Claude)
β βββ send-script/index.mjs # Send script for approval
β βββ video-animator/index.mjs # Initiate video generation
β βββ kie-callback/index.mjs # Handle video completion callback
β βββ send-video/index.mjs # Send video to user
βββ main.tf # Core infrastructure
βββ lambdas.tf # Lambda functions
βββ step-functions.tf # Workflow definition
βββ variables.tf # Configuration
βββ outputs.tf # Terraform outputs
βββ package.json # Node.js dependencies
βββ deploy.sh # Deployment script
βββ terraform.tfvars.example # Config template
- AWS Account with CLI configured
- Terraform >= 1.0
- Node.js >= 22
- API Keys:
# 1. Clone and install dependencies
cd content-machine
npm install
# 2. Configure API keys
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your API keys
# 3. Deploy infrastructure
./deploy.sh
# β
Done!
# Webhook is automatically set by Terraform.
# Monitoring links will be shown in the deployment output.Send a message to your Telegram bot:
Create a 10-second video of a cat exploring a magical forest
Or send a voice message with your instructions!
# Required API Keys
telegram_bot_token = "123456:ABC-DEF..."
kie_api_key = "kie_..."
openrouter_api_key = "sk-or-..."
# Video Settings
video_duration = 10 # 10 or 15 seconds
kie_video_model = "sora-2-pro-text-to-video" # or "kling/v2-1-pro"
# Storage
s3_lifecycle_days = 7 # Auto-delete files after 7 days
# AI Models (OpenRouter)
claude_model_id = "anthropic/claude-sonnet-4.5"
gemini_image_model_id = "google/gemini-2.5-flash-image"
gemini_flash_model_id = "google/gemini-3-flash-preview"
# User Access Control
allowed_telegram_users = [] # Empty = open access
# allowed_telegram_users = ["user1", "user2"] # Restricted accessAll AI prompts are centralized in src/config/prompts.mjs for easy editing:
export const PROMPTS = {
SCRIPTER: {
system: (videoDuration) => `Your custom scripter prompt...`,
user: (instruction, videoDuration) => `Create a ${videoDuration}-second video...`
},
VERIFIER: { /* ... */ },
IMAGE_GENERATOR: { /* ... */ },
REFINEMENT: { /* ... */ }
};User sends text or voice message via Telegram:
- Text: Direct instructions
- Voice: Transcribed using AWS Transcribe
Scripter Agent (Claude Sonnet 4.5) creates a detailed script:
{
"title": "Cat in Magical Forest",
"totalDuration": 10,
"scenes": [
{
"sceneNumber": 1,
"duration": 3,
"visualDescription": "A curious orange tabby cat...",
"narration": "Once upon a time...",
"cameraAngle": "wide shot",
"keyElements": ["cat", "forest", "glowing mushrooms"]
}
]
}Verifier Agent (Claude Sonnet 4.5) optimizes the script:
- Ensures timing adds up to exactly 10 seconds
- Enhances visual descriptions for AI generation
- Adds visual continuity between scenes
User receives script preview with Telegram inline buttons:
π Video Script Preview
Title: Cat in Magical Forest
Duration: 10 seconds
Scenes:
1. Scene 1 (3s)
πΉ A curious orange tabby cat...
π£οΈ "Once upon a time..."
π¬ wide shot
[β
Approve] [βοΈ Refine]
If user clicks "β Approve":
- User sees model selection with pricing:
β Script approved! π¬ Choose your video animation model: π° Total cost includes: OpenRouter AI ($0.10) + AWS ($0.07) + kie.ai [β‘ Sora2 Pro Standard - $0.92] [β¨ Sora2 Pro High Quality - $1.52] [π¨ Kling v2.1 Pro - $1.02] - User selects model and quality
- Workflow continues to video generation
If user clicks "βοΈ Refine":
- Bot asks: "Please send your refinement instructions:"
- User sends feedback (e.g., "Make the cat orange and add more magical elements")
- Claude Sonnet 4.5 refines the script based on feedback
- Bot sends refined script with approval buttons again
- User can approve or refine again (unlimited iterations!)
Key Features:
- βΈοΈ Step Functions pauses workflow (1-hour timeout)
- π Unlimited refinement iterations
- πΎ Each refinement builds on previous version
- π― Context preserved throughout refinements
- π° User chooses model and sees exact cost before generation
The workflow continues directly to video generation using the approved script.
Video Animator initiates the task:
- Combines all scene prompts
- Calls kie.ai API to start generation
- Provides a callback URL for completion notification
kie.ai Callback:
- Receives webhook execution from kie.ai
- Downloads completed video
- Stores video in S3
- Resumes Step Functions workflow
Send Video delivers the final video to user via Telegram with caption.
Users choose from three options after approving the script:
| Model | Quality | kie.ai Cost | Total Cost* | Best For |
|---|---|---|---|---|
| Sora2 Pro | Standard | $0.75 | $0.92 | Most cost-effective |
| Sora2 Pro | High | $1.35 | $1.52 | Premium quality |
| Kling v2.1 Pro | Standard | $0.85 | $1.02 | Alternative style |
*Total includes: kie.ai + OpenRouter ($0.10) + AWS ($0.07)
| Service | Cost | Notes |
|---|---|---|
| OpenRouter | $0.05 | 2x Script Processing (Claude) |
| AWS Lambda | $0.05 | 8 functions, ~3 min total |
| AWS Transcribe | $0.02 | If voice message |
| S3 + DynamoDB | $0.002 | Storage + queries |
| Step Functions | $0.000025 | 6-7 state transitions |
| kie.ai | $0.75 - $1.35 | User's choice |
With Sora2 Pro Standard ($0.92/video):
- 10 videos: $9
- 100 videos: $92
- 1000 videos: $920
With Sora2 Pro High ($1.52/video):
- 10 videos: $15
- 100 videos: $152
- 1000 videos: $1,520
# AWS Console
https://console.aws.amazon.com/states/home
# Or get state machine ARN
terraform output state_machine_arn# Real-time logs
aws logs tail /aws/lambda/content-machine-dev-webhook --follow
aws logs tail /aws/lambda/content-machine-dev-scripter --follow
aws logs tail /aws/lambda/content-machine-dev-video-animator --follow# Get table name
terraform output jobs_table_name
# Query jobs
aws dynamodb scan --table-name content-machine-dev-jobs-
Check webhook:
curl "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/getWebhookInfo" -
Check Lambda logs:
aws logs tail /aws/lambda/content-machine-dev-webhook --follow
- Check Step Functions: AWS Console > Step Functions > Executions
- Verify API keys: AWS Console > Secrets Manager
- Check Lambda logs:
video-animatorfor initiation,kie-callbackfor completion - Check kie.ai credits: kie.ai/logs
aws transcribe list-transcription-jobs --status FAILEDAfter making code changes:
terraform applyTerraform automatically detects code changes and updates Lambda functions.
Edit src/config/prompts.mjs and redeploy:
terraform apply# Set environment variables
export JOBS_TABLE_NAME=content-machine-dev-jobs
export AUDIO_BUCKET_NAME=content-machine-dev-audio
export VIDEO_BUCKET_NAME=content-machine-dev-video
export VIDEO_DURATION=10
# ... other env vars
# Test a function
node -e "import('./src/scripter/index.mjs').then(m => m.handler({...}))"β Multi-Agent AI Pipeline - Specialized agents for each task β Human-in-the-Loop - User approves scripts before video generation β Centralized Prompts - Easy to edit and iterate on AI behavior β Serverless - No servers to manage, scales automatically β Cost-Effective - ~$1 per video with 7-day S3 lifecycle β Reliable - Step Functions with automatic retries β Voice Support - AWS Transcribe for voice messages β Telegram Native - Inline buttons for approval workflow
| Function | Purpose | Timeout | Memory |
|---|---|---|---|
| webhook | Telegram entry (Msg & Callback) | 60s | 512MB |
| transcribe | Voice to text | 180s | 512MB |
| scripter | Generate script (Claude) | 60s | 512MB |
| verifier | Refine script (Claude) | 60s | 512MB |
| send-script | Send script for approval | 30s | 512MB |
| video-animator | Initiate video (kie.ai) | 600s | 1024MB |
| kie-callback | Handle completion | 120s | 1024MB |
| send-video | Send to Telegram | 120s | 1024MB |
- audio: Voice files (7-day lifecycle)
- video: Generated videos (7-day lifecycle)
- jobs: Job tracking with status, timestamps, task tokens
- TTL: 30 days
- GSI: UserIdIndex (query jobs by user)
- users: User whitelist (optional)
- API Keys: Stored in AWS Secrets Manager
- IAM Roles: Least privilege access
- S3: Private buckets with lifecycle policies
- User Access: Optional whitelist via
allowed_telegram_users
To destroy all resources:
terraform destroyWarning: This deletes all S3 buckets, DynamoDB tables, and Lambda functions.
MIT
Built with:
- AWS Lambda
- OpenRouter (Claude, Gemini)
- kie.ai (Video generation)
- Telegram Bot API
- Terraform