Content Machine - AI Video Generation Telegram Bot

Serverless Telegram bot that generates 10-second videos from text or voice instructions using a multi-agent AI pipeline with human-in-the-loop script approval.

🎯 What It Does

User sends text or voice message with video instructions
AI generates a detailed video script (Claude Sonnet 4.5)
AI refines the script for optimal video generation (Claude Sonnet 4.5)
User approves the script and selects a video model (human-in-the-loop)
AI animates the video directly from the script using kie.ai (Sora2 Pro or Kling)
User receives the completed 10-second video

🏗️ Architecture

Multi-Agent Workflow

┌─────────────────────────────────────────────────────────────────┐
│                     TELEGRAM BOT WORKFLOW                        │
└─────────────────────────────────────────────────────────────────┘

User Message (Text/Voice)
         ↓
    [Webhook Lambda] ──→ Start Step Functions
         ↓
    ┌────────────────────────────────────────┐
    │      STEP FUNCTIONS ORCHESTRATION       │
    └────────────────────────────────────────┘
         ↓
    Voice? ──Yes──→ [Transcribe Lambda]
         ↓              (AWS Transcribe)
         No
         ↓
    [Scripter Agent] ──→ Claude Sonnet 4.5
         ↓              (Generate video script)
    [Verifier Agent] ──→ Claude Sonnet 4.5
         ↓              (Refine script)
    [Send Script] ──────→ User receives script preview
         ↓
    ⏸️  PAUSE: Wait for user approval
         ↓
    User clicks: ✅ Approve  or  ✏️ Refine
         ↓
    [Callback Handler] ──→ Process approval/refinement
         ↓
    [Image Generator] ──→ Gemini 2.5 Flash Image
         ↓              (Create scene prompts)
    [Video Animator] ───→ kie.ai API
                                  ↓
                          (Start generation)
                                  ↓
    [kie-callback] ←───── Callback from kie.ai
         ↓
    [Send Video] ───────→ User receives video

Technology Stack

Component	Technology	Purpose
Infrastructure	Terraform + AWS	Serverless deployment
Orchestration	Step Functions	Multi-agent workflow with human-in-the-loop
Compute	Lambda (Node.js 22)	8 serverless functions
Storage	S3 (7-day lifecycle)	Audio/video files
Database	DynamoDB	Job tracking & state
AI Models	OpenRouter	Unified API for Claude
Script Generation	Claude Sonnet 4.5	Script writing & refinement
Video Generation	kie.ai (Sora2/Kling)	10-second video animation
Voice Transcription	AWS Transcribe	Speech-to-text
Messaging	Telegram Bot API	User interface

📁 Project Structure

content-machine/
├── src/
│   ├── config/
│   │   └── prompts.mjs              # 🎯 Centralized AI prompts
│   ├── lib/
│   │   ├── secrets.mjs              # AWS Secrets Manager
│   │   ├── telegram.mjs             # Telegram API client
│   │   ├── dynamodb.mjs             # DynamoDB helpers
│   │   ├── openrouter.mjs           # OpenRouter API (Claude, Gemini)
│   │   └── kie.mjs                  # kie.ai video generation
│   ├── webhook/index.mjs            # Telegram webhook & callback handler
│   ├── transcribe/index.mjs         # Voice → text (AWS Transcribe)
│   ├── scripter/index.mjs           # Script generation (Claude)
│   ├── verifier/index.mjs           # Script refinement (Claude)
│   ├── send-script/index.mjs        # Send script for approval
│   ├── video-animator/index.mjs     # Initiate video generation
│   ├── kie-callback/index.mjs       # Handle video completion callback
│   └── send-video/index.mjs         # Send video to user
├── main.tf                          # Core infrastructure
├── lambdas.tf                       # Lambda functions
├── step-functions.tf                # Workflow definition
├── variables.tf                     # Configuration
├── outputs.tf                       # Terraform outputs
├── package.json                     # Node.js dependencies
├── deploy.sh                        # Deployment script
└── terraform.tfvars.example         # Config template

🚀 Quick Start

Prerequisites

AWS Account with CLI configured
Terraform >= 1.0
Node.js >= 22
API Keys:

Installation

# 1. Clone and install dependencies
cd content-machine
npm install

# 2. Configure API keys
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your API keys

# 3. Deploy infrastructure
./deploy.sh

# ✅ Done! 
# Webhook is automatically set by Terraform.
# Monitoring links will be shown in the deployment output.

Test the Bot

Send a message to your Telegram bot:

Create a 10-second video of a cat exploring a magical forest

Or send a voice message with your instructions!

⚙️ Configuration

terraform.tfvars

# Required API Keys
telegram_bot_token = "123456:ABC-DEF..."
kie_api_key = "kie_..."
openrouter_api_key = "sk-or-..."

# Video Settings
video_duration = 10                    # 10 or 15 seconds
kie_video_model = "sora-2-pro-text-to-video"  # or "kling/v2-1-pro"

# Storage
s3_lifecycle_days = 7                  # Auto-delete files after 7 days

# AI Models (OpenRouter)
claude_model_id = "anthropic/claude-sonnet-4.5"
gemini_image_model_id = "google/gemini-2.5-flash-image"
gemini_flash_model_id = "google/gemini-3-flash-preview"

# User Access Control
allowed_telegram_users = []            # Empty = open access
# allowed_telegram_users = ["user1", "user2"]  # Restricted access

Customizing AI Prompts

All AI prompts are centralized in src/config/prompts.mjs for easy editing:

export const PROMPTS = {
  SCRIPTER: {
    system: (videoDuration) => `Your custom scripter prompt...`,
    user: (instruction, videoDuration) => `Create a ${videoDuration}-second video...`
  },
  VERIFIER: { /* ... */ },
  IMAGE_GENERATOR: { /* ... */ },
  REFINEMENT: { /* ... */ }
};

🎬 How It Works

1. User Sends Request

User sends text or voice message via Telegram:

Text: Direct instructions
Voice: Transcribed using AWS Transcribe

2. Script Generation

Scripter Agent (Claude Sonnet 4.5) creates a detailed script:

{
  "title": "Cat in Magical Forest",
  "totalDuration": 10,
  "scenes": [
    {
      "sceneNumber": 1,
      "duration": 3,
      "visualDescription": "A curious orange tabby cat...",
      "narration": "Once upon a time...",
      "cameraAngle": "wide shot",
      "keyElements": ["cat", "forest", "glowing mushrooms"]
    }
  ]
}

3. Script Refinement

Verifier Agent (Claude Sonnet 4.5) optimizes the script:

Ensures timing adds up to exactly 10 seconds
Enhances visual descriptions for AI generation
Adds visual continuity between scenes

4. Human-in-the-Loop Approval

User receives script preview with Telegram inline buttons:

📝 Video Script Preview

Title: Cat in Magical Forest
Duration: 10 seconds

Scenes:
1. Scene 1 (3s)
   📹 A curious orange tabby cat...
   🗣️ "Once upon a time..."
   🎬 wide shot

[✅ Approve]  [✏️ Refine]

If user clicks "✅ Approve":

User sees model selection with pricing:

✅ Script approved!

🎬 Choose your video animation model:

💰 Total cost includes: OpenRouter AI ($0.10) + AWS ($0.07) + kie.ai

[⚡ Sora2 Pro Standard - $0.92]
[✨ Sora2 Pro High Quality - $1.52]
[🎨 Kling v2.1 Pro - $1.02]

User selects model and quality
Workflow continues to video generation

If user clicks "✏️ Refine":

Bot asks: "Please send your refinement instructions:"
User sends feedback (e.g., "Make the cat orange and add more magical elements")
Claude Sonnet 4.5 refines the script based on feedback
Bot sends refined script with approval buttons again
User can approve or refine again (unlimited iterations!)

Key Features:

⏸️ Step Functions pauses workflow (1-hour timeout)
🔄 Unlimited refinement iterations
💾 Each refinement builds on previous version
🎯 Context preserved throughout refinements
💰 User chooses model and sees exact cost before generation

The workflow continues directly to video generation using the approved script.

6. Video Animation

Video Animator initiates the task:

Combines all scene prompts
Calls kie.ai API to start generation
Provides a callback URL for completion notification

kie.ai Callback:

Receives webhook execution from kie.ai
Downloads completed video
Stores video in S3
Resumes Step Functions workflow

7. Delivery

Send Video delivers the final video to user via Telegram with caption.

💰 Cost Breakdown

Model Options (10-Second Video)

Users choose from three options after approving the script:

Model	Quality	kie.ai Cost	Total Cost*	Best For
Sora2 Pro	Standard	$0.75	$0.92	Most cost-effective
Sora2 Pro	High	$1.35	$1.52	Premium quality
Kling v2.1 Pro	Standard	$0.85	$1.02	Alternative style

*Total includes: kie.ai + OpenRouter ($0.10) + AWS ($0.07)

Cost Components

Service	Cost	Notes
OpenRouter	$0.05	2x Script Processing (Claude)
AWS Lambda	$0.05	8 functions, ~3 min total
AWS Transcribe	$0.02	If voice message
S3 + DynamoDB	$0.002	Storage + queries
Step Functions	$0.000025	6-7 state transitions
kie.ai	$0.75 - $1.35	User's choice

Monthly Estimates

With Sora2 Pro Standard ($0.92/video):

10 videos: $9
100 videos: $92
1000 videos: $920

With Sora2 Pro High ($1.52/video):

10 videos: $15
100 videos: $152
1000 videos: $1,520

🔍 Monitoring

View Step Functions Executions

# AWS Console
https://console.aws.amazon.com/states/home

# Or get state machine ARN
terraform output state_machine_arn

View Lambda Logs

# Real-time logs
aws logs tail /aws/lambda/content-machine-dev-webhook --follow
aws logs tail /aws/lambda/content-machine-dev-scripter --follow
aws logs tail /aws/lambda/content-machine-dev-video-animator --follow

Check Job Status

# Get table name
terraform output jobs_table_name

# Query jobs
aws dynamodb scan --table-name content-machine-dev-jobs

🐛 Troubleshooting

Bot Not Responding

Check webhook:

curl "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/getWebhookInfo"

Check Lambda logs:

aws logs tail /aws/lambda/content-machine-dev-webhook --follow

Video Generation Failing

Check Step Functions: AWS Console > Step Functions > Executions
Verify API keys: AWS Console > Secrets Manager
Check Lambda logs: video-animator for initiation, kie-callback for completion
Check kie.ai credits: kie.ai/logs

Transcription Errors

aws transcribe list-transcription-jobs --status FAILED

🔧 Development

Update Lambda Code

After making code changes:

terraform apply

Terraform automatically detects code changes and updates Lambda functions.

Update AI Prompts

Edit src/config/prompts.mjs and redeploy:

terraform apply

Local Testing

# Set environment variables
export JOBS_TABLE_NAME=content-machine-dev-jobs
export AUDIO_BUCKET_NAME=content-machine-dev-audio
export VIDEO_BUCKET_NAME=content-machine-dev-video
export VIDEO_DURATION=10
# ... other env vars

# Test a function
node -e "import('./src/scripter/index.mjs').then(m => m.handler({...}))"

🎯 Key Features

✅ Multi-Agent AI Pipeline - Specialized agents for each task ✅ Human-in-the-Loop - User approves scripts before video generation ✅ Centralized Prompts - Easy to edit and iterate on AI behavior ✅ Serverless - No servers to manage, scales automatically ✅ Cost-Effective - ~$1 per video with 7-day S3 lifecycle ✅ Reliable - Step Functions with automatic retries ✅ Voice Support - AWS Transcribe for voice messages ✅ Telegram Native - Inline buttons for approval workflow

📊 Lambda Functions

Function	Purpose	Timeout	Memory
webhook	Telegram entry (Msg & Callback)	60s	512MB
transcribe	Voice to text	180s	512MB
scripter	Generate script (Claude)	60s	512MB
verifier	Refine script (Claude)	60s	512MB
send-script	Send script for approval	30s	512MB
video-animator	Initiate video (kie.ai)	600s	1024MB
kie-callback	Handle completion	120s	1024MB
send-video	Send to Telegram	120s	1024MB

🗄️ Storage

S3 Buckets

audio: Voice files (7-day lifecycle)
video: Generated videos (7-day lifecycle)

DynamoDB Tables

jobs: Job tracking with status, timestamps, task tokens
- TTL: 30 days
- GSI: UserIdIndex (query jobs by user)
users: User whitelist (optional)

🔐 Security

API Keys: Stored in AWS Secrets Manager
IAM Roles: Least privilege access
S3: Private buckets with lifecycle policies
User Access: Optional whitelist via allowed_telegram_users

🧹 Cleanup

To destroy all resources:

terraform destroy

Warning: This deletes all S3 buckets, DynamoDB tables, and Lambda functions.

📝 License

MIT

🙏 Credits

Built with:

AWS Lambda
OpenRouter (Claude, Gemini)
kie.ai (Video generation)
Telegram Bot API
Terraform

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
deploy.sh		deploy.sh
lambdas.tf		lambdas.tf
main.tf		main.tf
outputs.tf		outputs.tf
package.json		package.json
step-functions.tf		step-functions.tf
terraform.tfvars.example		terraform.tfvars.example
variables.tf		variables.tf
webhook.tf		webhook.tf

didiberman/content-machine

Folders and files

Latest commit

History

Repository files navigation