Skip to content

Video generating Telegram bot with Kie.ai that works async with AWS Lambda + SQS

Notifications You must be signed in to change notification settings

didiberman/content-machine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Content Machine - AI Video Generation Telegram Bot

Serverless Telegram bot that generates 10-second videos from text or voice instructions using a multi-agent AI pipeline with human-in-the-loop script approval.

🎯 What It Does

  1. User sends text or voice message with video instructions
  2. AI generates a detailed video script (Claude Sonnet 4.5)
  3. AI refines the script for optimal video generation (Claude Sonnet 4.5)
  4. User approves the script and selects a video model (human-in-the-loop)
  5. AI animates the video directly from the script using kie.ai (Sora2 Pro or Kling)
  6. User receives the completed 10-second video

πŸ—οΈ Architecture

Multi-Agent Workflow

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                     TELEGRAM BOT WORKFLOW                        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

User Message (Text/Voice)
         ↓
    [Webhook Lambda] ──→ Start Step Functions
         ↓
    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
    β”‚      STEP FUNCTIONS ORCHESTRATION       β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         ↓
    Voice? ──Yes──→ [Transcribe Lambda]
         ↓              (AWS Transcribe)
         No
         ↓
    [Scripter Agent] ──→ Claude Sonnet 4.5
         ↓              (Generate video script)
    [Verifier Agent] ──→ Claude Sonnet 4.5
         ↓              (Refine script)
    [Send Script] ──────→ User receives script preview
         ↓
    ⏸️  PAUSE: Wait for user approval
         ↓
    User clicks: βœ… Approve  or  ✏️ Refine
         ↓
    [Callback Handler] ──→ Process approval/refinement
         ↓
    [Image Generator] ──→ Gemini 2.5 Flash Image
         ↓              (Create scene prompts)
    [Video Animator] ───→ kie.ai API
                                  ↓
                          (Start generation)
                                  ↓
    [kie-callback] ←───── Callback from kie.ai
         ↓
    [Send Video] ───────→ User receives video

Technology Stack

Component Technology Purpose
Infrastructure Terraform + AWS Serverless deployment
Orchestration Step Functions Multi-agent workflow with human-in-the-loop
Compute Lambda (Node.js 22) 8 serverless functions
Storage S3 (7-day lifecycle) Audio/video files
Database DynamoDB Job tracking & state
AI Models OpenRouter Unified API for Claude
Script Generation Claude Sonnet 4.5 Script writing & refinement
Video Generation kie.ai (Sora2/Kling) 10-second video animation
Voice Transcription AWS Transcribe Speech-to-text
Messaging Telegram Bot API User interface

πŸ“ Project Structure

content-machine/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   └── prompts.mjs              # 🎯 Centralized AI prompts
β”‚   β”œβ”€β”€ lib/
β”‚   β”‚   β”œβ”€β”€ secrets.mjs              # AWS Secrets Manager
β”‚   β”‚   β”œβ”€β”€ telegram.mjs             # Telegram API client
β”‚   β”‚   β”œβ”€β”€ dynamodb.mjs             # DynamoDB helpers
β”‚   β”‚   β”œβ”€β”€ openrouter.mjs           # OpenRouter API (Claude, Gemini)
β”‚   β”‚   └── kie.mjs                  # kie.ai video generation
β”‚   β”œβ”€β”€ webhook/index.mjs            # Telegram webhook & callback handler
β”‚   β”œβ”€β”€ transcribe/index.mjs         # Voice β†’ text (AWS Transcribe)
β”‚   β”œβ”€β”€ scripter/index.mjs           # Script generation (Claude)
β”‚   β”œβ”€β”€ verifier/index.mjs           # Script refinement (Claude)
β”‚   β”œβ”€β”€ send-script/index.mjs        # Send script for approval
β”‚   β”œβ”€β”€ video-animator/index.mjs     # Initiate video generation
β”‚   β”œβ”€β”€ kie-callback/index.mjs       # Handle video completion callback
β”‚   └── send-video/index.mjs         # Send video to user
β”œβ”€β”€ main.tf                          # Core infrastructure
β”œβ”€β”€ lambdas.tf                       # Lambda functions
β”œβ”€β”€ step-functions.tf                # Workflow definition
β”œβ”€β”€ variables.tf                     # Configuration
β”œβ”€β”€ outputs.tf                       # Terraform outputs
β”œβ”€β”€ package.json                     # Node.js dependencies
β”œβ”€β”€ deploy.sh                        # Deployment script
└── terraform.tfvars.example         # Config template

πŸš€ Quick Start

Prerequisites

  1. AWS Account with CLI configured
  2. Terraform >= 1.0
  3. Node.js >= 22
  4. API Keys:

Installation

# 1. Clone and install dependencies
cd content-machine
npm install

# 2. Configure API keys
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your API keys

# 3. Deploy infrastructure
./deploy.sh

# βœ… Done! 
# Webhook is automatically set by Terraform.
# Monitoring links will be shown in the deployment output.

Test the Bot

Send a message to your Telegram bot:

Create a 10-second video of a cat exploring a magical forest

Or send a voice message with your instructions!

βš™οΈ Configuration

terraform.tfvars

# Required API Keys
telegram_bot_token = "123456:ABC-DEF..."
kie_api_key = "kie_..."
openrouter_api_key = "sk-or-..."

# Video Settings
video_duration = 10                    # 10 or 15 seconds
kie_video_model = "sora-2-pro-text-to-video"  # or "kling/v2-1-pro"

# Storage
s3_lifecycle_days = 7                  # Auto-delete files after 7 days

# AI Models (OpenRouter)
claude_model_id = "anthropic/claude-sonnet-4.5"
gemini_image_model_id = "google/gemini-2.5-flash-image"
gemini_flash_model_id = "google/gemini-3-flash-preview"

# User Access Control
allowed_telegram_users = []            # Empty = open access
# allowed_telegram_users = ["user1", "user2"]  # Restricted access

Customizing AI Prompts

All AI prompts are centralized in src/config/prompts.mjs for easy editing:

export const PROMPTS = {
  SCRIPTER: {
    system: (videoDuration) => `Your custom scripter prompt...`,
    user: (instruction, videoDuration) => `Create a ${videoDuration}-second video...`
  },
  VERIFIER: { /* ... */ },
  IMAGE_GENERATOR: { /* ... */ },
  REFINEMENT: { /* ... */ }
};

🎬 How It Works

1. User Sends Request

User sends text or voice message via Telegram:

  • Text: Direct instructions
  • Voice: Transcribed using AWS Transcribe

2. Script Generation

Scripter Agent (Claude Sonnet 4.5) creates a detailed script:

{
  "title": "Cat in Magical Forest",
  "totalDuration": 10,
  "scenes": [
    {
      "sceneNumber": 1,
      "duration": 3,
      "visualDescription": "A curious orange tabby cat...",
      "narration": "Once upon a time...",
      "cameraAngle": "wide shot",
      "keyElements": ["cat", "forest", "glowing mushrooms"]
    }
  ]
}

3. Script Refinement

Verifier Agent (Claude Sonnet 4.5) optimizes the script:

  • Ensures timing adds up to exactly 10 seconds
  • Enhances visual descriptions for AI generation
  • Adds visual continuity between scenes

4. Human-in-the-Loop Approval

User receives script preview with Telegram inline buttons:

πŸ“ Video Script Preview

Title: Cat in Magical Forest
Duration: 10 seconds

Scenes:
1. Scene 1 (3s)
   πŸ“Ή A curious orange tabby cat...
   πŸ—£οΈ "Once upon a time..."
   🎬 wide shot

[βœ… Approve]  [✏️ Refine]

If user clicks "βœ… Approve":

  • User sees model selection with pricing:
    βœ… Script approved!
    
    🎬 Choose your video animation model:
    
    πŸ’° Total cost includes: OpenRouter AI ($0.10) + AWS ($0.07) + kie.ai
    
    [⚑ Sora2 Pro Standard - $0.92]
    [✨ Sora2 Pro High Quality - $1.52]
    [🎨 Kling v2.1 Pro - $1.02]
    
  • User selects model and quality
  • Workflow continues to video generation

If user clicks "✏️ Refine":

  1. Bot asks: "Please send your refinement instructions:"
  2. User sends feedback (e.g., "Make the cat orange and add more magical elements")
  3. Claude Sonnet 4.5 refines the script based on feedback
  4. Bot sends refined script with approval buttons again
  5. User can approve or refine again (unlimited iterations!)

Key Features:

  • ⏸️ Step Functions pauses workflow (1-hour timeout)
  • πŸ”„ Unlimited refinement iterations
  • πŸ’Ύ Each refinement builds on previous version
  • 🎯 Context preserved throughout refinements
  • πŸ’° User chooses model and sees exact cost before generation

The workflow continues directly to video generation using the approved script.

6. Video Animation

Video Animator initiates the task:

  • Combines all scene prompts
  • Calls kie.ai API to start generation
  • Provides a callback URL for completion notification

kie.ai Callback:

  • Receives webhook execution from kie.ai
  • Downloads completed video
  • Stores video in S3
  • Resumes Step Functions workflow

7. Delivery

Send Video delivers the final video to user via Telegram with caption.

πŸ’° Cost Breakdown

Model Options (10-Second Video)

Users choose from three options after approving the script:

Model Quality kie.ai Cost Total Cost* Best For
Sora2 Pro Standard $0.75 $0.92 Most cost-effective
Sora2 Pro High $1.35 $1.52 Premium quality
Kling v2.1 Pro Standard $0.85 $1.02 Alternative style

*Total includes: kie.ai + OpenRouter ($0.10) + AWS ($0.07)

Cost Components

Service Cost Notes
OpenRouter $0.05 2x Script Processing (Claude)
AWS Lambda $0.05 8 functions, ~3 min total
AWS Transcribe $0.02 If voice message
S3 + DynamoDB $0.002 Storage + queries
Step Functions $0.000025 6-7 state transitions
kie.ai $0.75 - $1.35 User's choice

Monthly Estimates

With Sora2 Pro Standard ($0.92/video):

  • 10 videos: $9
  • 100 videos: $92
  • 1000 videos: $920

With Sora2 Pro High ($1.52/video):

  • 10 videos: $15
  • 100 videos: $152
  • 1000 videos: $1,520

πŸ” Monitoring

View Step Functions Executions

# AWS Console
https://console.aws.amazon.com/states/home

# Or get state machine ARN
terraform output state_machine_arn

View Lambda Logs

# Real-time logs
aws logs tail /aws/lambda/content-machine-dev-webhook --follow
aws logs tail /aws/lambda/content-machine-dev-scripter --follow
aws logs tail /aws/lambda/content-machine-dev-video-animator --follow

Check Job Status

# Get table name
terraform output jobs_table_name

# Query jobs
aws dynamodb scan --table-name content-machine-dev-jobs

πŸ› Troubleshooting

Bot Not Responding

  1. Check webhook:

    curl "https://api.telegram.org/bot${TELEGRAM_BOT_TOKEN}/getWebhookInfo"
  2. Check Lambda logs:

    aws logs tail /aws/lambda/content-machine-dev-webhook --follow

Video Generation Failing

  1. Check Step Functions: AWS Console > Step Functions > Executions
  2. Verify API keys: AWS Console > Secrets Manager
  3. Check Lambda logs: video-animator for initiation, kie-callback for completion
  4. Check kie.ai credits: kie.ai/logs

Transcription Errors

aws transcribe list-transcription-jobs --status FAILED

πŸ”§ Development

Update Lambda Code

After making code changes:

terraform apply

Terraform automatically detects code changes and updates Lambda functions.

Update AI Prompts

Edit src/config/prompts.mjs and redeploy:

terraform apply

Local Testing

# Set environment variables
export JOBS_TABLE_NAME=content-machine-dev-jobs
export AUDIO_BUCKET_NAME=content-machine-dev-audio
export VIDEO_BUCKET_NAME=content-machine-dev-video
export VIDEO_DURATION=10
# ... other env vars

# Test a function
node -e "import('./src/scripter/index.mjs').then(m => m.handler({...}))"

🎯 Key Features

βœ… Multi-Agent AI Pipeline - Specialized agents for each task βœ… Human-in-the-Loop - User approves scripts before video generation βœ… Centralized Prompts - Easy to edit and iterate on AI behavior βœ… Serverless - No servers to manage, scales automatically βœ… Cost-Effective - ~$1 per video with 7-day S3 lifecycle βœ… Reliable - Step Functions with automatic retries βœ… Voice Support - AWS Transcribe for voice messages βœ… Telegram Native - Inline buttons for approval workflow

πŸ“Š Lambda Functions

Function Purpose Timeout Memory
webhook Telegram entry (Msg & Callback) 60s 512MB
transcribe Voice to text 180s 512MB
scripter Generate script (Claude) 60s 512MB
verifier Refine script (Claude) 60s 512MB
send-script Send script for approval 30s 512MB
video-animator Initiate video (kie.ai) 600s 1024MB
kie-callback Handle completion 120s 1024MB
send-video Send to Telegram 120s 1024MB

πŸ—„οΈ Storage

S3 Buckets

  • audio: Voice files (7-day lifecycle)
  • video: Generated videos (7-day lifecycle)

DynamoDB Tables

  • jobs: Job tracking with status, timestamps, task tokens
    • TTL: 30 days
    • GSI: UserIdIndex (query jobs by user)
  • users: User whitelist (optional)

πŸ” Security

  • API Keys: Stored in AWS Secrets Manager
  • IAM Roles: Least privilege access
  • S3: Private buckets with lifecycle policies
  • User Access: Optional whitelist via allowed_telegram_users

🧹 Cleanup

To destroy all resources:

terraform destroy

Warning: This deletes all S3 buckets, DynamoDB tables, and Lambda functions.

πŸ“ License

MIT

πŸ™ Credits

Built with:

About

Video generating Telegram bot with Kie.ai that works async with AWS Lambda + SQS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published