π§ Audio β’ π¬ Captions β’ π SEO β’ π Translations β’ π§ Knowledge Base
End-to-end AI automation for video processing with contextual intelligence.
VAI0 (Video Auto Intelligence Operator) is an end-to-end CLI workflow that converts your raw videos into multilingual, SEO-optimized YouTube assets β including captions, titles, and descriptions β enhanced with contextual knowledge for superior content quality.
| Stage | Description |
|---|---|
| π§Audio Extraction | Extracts .mp3 from your video using FFmpeg |
| π¬Caption Generation | Transcribes or translates audio to .srt via Whisper |
| πTD Generation | Builds SEO-optimizedTitle + Description (TD) using Ollama with template support |
| πTD Translation | Localizes TDs into multiple target languages with cultural adaptation |
| π¬Caption Translation | Produces synchronized .srt subtitles in all supported languages |
| π§ Knowledge Base | Enhances generation with domain-specific context (PDFs, docs, guides) |
| βοΈAuto Resume | Tracks progress in .vaio.json, enabling vaio continue |
VAI0 uses a modular operator model where each stage can run independently or in sequence:
VAI0/
βββ config.yml
βββ vaio/ # Core framework
β βββ cli.py # CLI Controller
β βββ core/ # Base utilities & stage implementations
β βββ kb/ # Knowledge Base integration
βββ knowledge/ # Domain knowledge sources
β βββ default/ # Default reference materials
βββ data/ # Persistent data
βββ kb/ # Vector store (ChromaDB)
# Clone and setup
git clone https://github.com/number16busshelter/vaio.git
cd vaio
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# Run full automation
vaio ./MyVideo.mp4VAIO automatically performs:
π§ Audio extraction β π¬ Captioning β π TD generation β π Translation β π¬ Caption translation
All outputs are stored beside the video.
VAI0 can enhance content generation with domain-specific knowledge:
# Knowledge sources go here
knowledge/default/
βββ product-guides.pdf
βββ brand-guidelines.md
βββ technical-specs.txt
βββ marketing-materials/
# Vector storage (auto-created)
data/kb/default/Set in your video's .vaio.json:
{
"knowledge": "/path/to/your/knowledge",
"language": "en",
"title": "...",
"description": "..."
}# Build knowledge base from documents
vaio kb build ./video.mp4
# Set custom knowledge directory
vaio kb set ./video.mp4 --knowledge ./my-docs
# Disable KB for a project
vaio kb set ./video.mp4 --knowledge none
# View KB statistics
vaio kb stats ./video.mp4
# List indexed documents
vaio kb list ./video.mp4Create tdtmp.txt for structured content generation:
<!-- <Instructions> -->
- Generate high-quality, SEO-optimized content
- Use professional tone
- Preserve all formatting outside semantic blocks
<!-- </Instructions> -->
<!-- <Context> -->
Your brand context and guidelines here
<!-- </Context> -->
<!-- <Video Name> -->
Suggested title inspiration
<!-- </Video Name> -->
<!-- <Video Description> -->
Style and tone guidelines for description
<!-- </Video Description> -->
βββββββββββββββββββ
π Your permanent links
π·οΈ Product specifications
βοΈ Global delivery info
βββββββββββββββββββ
<!-- <Hash tags> -->
#Your #Hashtag #Inspiration
<!-- </Hash tags> -->VAI0 will:
- Interpret semantic blocks as guidelines
- Generate fresh, optimized content
- Preserve all verbatim formatting exactly
- Optimize hashtags based on content
| Dependency | Purpose | Installation |
|---|---|---|
| FFmpeg | Audio extraction | brew install ffmpeg or download |
| Whisper | Speech-to-text | pip install openai-whisper |
| Ollama | Local LLM runtime | Install Ollama |
| Python 3.12+ | Runtime | Python downloads |
vaio checkExpected output:
FFmpeg: β
OK
Whisper: β
OK
Ollama: β
OK
Meta file access: β
OK
Knowledge Base: β
OK
| Command | Purpose |
|---|---|
vaio <video> |
Full automation pipeline |
vaio audio <video> |
Extract audio & generate captions |
vaio desc <video> |
Create SEO title + description |
vaio translate <video> |
Translate TDs into multiple languages |
vaio captions <video> |
Translate .srt subtitles |
vaio continue <video> |
Resume from last completed stage |
| Command | Purpose |
|---|---|
vaio kb build <video> |
Build/re-build KB index |
vaio kb list <video> |
List indexed documents |
vaio kb stats <video> |
Show KB statistics |
vaio kb clear <video> |
Clear KB index (keep files) |
vaio kb set <video> --knowledge <path> |
Set custom KB path |
MyVideo.mp4
βββ MyVideo.mp3
βββ captions/
β βββ MyVideo.en.srt
β βββ MyVideo.es.srt
β βββ ...
βββ description/
β βββ td.en.txt
β βββ td.es.txt
β βββ ...
βββ knowledge/ # (if project-specific KB)
β βββ product-info.pdf
β βββ brand-guidelines.md
βββ MyVideo.vaio.json # Progress tracking & config
SOURCE_LANGUAGE = "English"
SOURCE_LANGUAGE_CODE = "en"
TARGET_LANGUAGES = {
"en": "English",
"es": "Spanish",
"fr": "French",
"de": "German",
"ja": "Japanese",
"zh": "Chinese",
}
WHISPER_MODEL = "large-v3-turbo"
OLLAMA_MODEL = "llama3.1:8b"
DEFAULT_EMBED_MODEL = "sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2"- π PDF, TXT, MD, JSON, YAML, CSV
- π« Auto-ignores:
.DS_Store,.git, lock files, system files
# 1. Setup knowledge base
cp -r my-product-docs/ knowledge/default/
# 2. Build KB index
vaio kb build ./product-video.mp4
# 3. Create template
cp tdtmp.example.txt product-video-tdtmp.txt
# Edit template with your brand guidelines...
# 4. Run enhanced generation
vaio desc ./product-video.mp4 --template-file product-video-tdtmp.txtOutput:
π§ KB active: vaio_kb_default (15 documents)
π Using template: product-video-tdtmp.txt
π§± Parsed template sections: Instructions, Context, Video Name, Video Description, Hash tags
π§ Generating FRESH description content...
π§ Optimizing hashtags...
β
TD generated β description/td.en.txt
FROM python:3.12-slim
# Install system dependencies
RUN apt-get update && apt-get install -y ffmpeg && rm -rf /var/lib/apt/lists/*
# Install VAI0
COPY . /app
WORKDIR /app
RUN pip install -r requirements.txt
ENTRYPOINT ["python", "vaio/cli.py"]Build and run:
docker build -t vaio .
docker run -v $(pwd):/workspace vaio /workspace/MyVideo.mp4vaio/
βββ core/
β βββ audio.py # Audio extraction
β βββ description.py # TD generation with templates
β βββ translate.py # Multilingual translation
β βββ captions.py # Subtitle processing
β βββ constants.py # Configuration
βββ kb/
β βββ loader.py # Document loading
β βββ store.py # Vector storage (Chroma)
β βββ query.py # Context retrieval
β βββ cli.py # KB management commands
βββ cli.py # Main entry point
# Test individual stages
vaio audio ./test.mp4
vaio desc ./test.mp4 --template-file tdtmp.example.txt
vaio kb build ./test.mp4
vaio kb stats ./test.mp4Create .vscode/launch.json:
{
"version": "0.2.0",
"configurations": [
{
"name": "Run VAI0",
"type": "python",
"request": "launch",
"program": "vaio/cli.py",
"args": ["./test.mp4"],
"console": "integratedTerminal"
}
]
}- FFmpeg - Audio/video processing
- Whisper - Speech recognition
- Ollama - Local LLM runtime
- Chroma - Vector database
- LlamaIndex - Retrieval framework
- Rich - Terminal formatting
MIT License Β© 2025 AXID.ONE
We welcome contributions! Please see our Contributing Guidelines and check the issue tracker before submitting pull requests.
- π Documentation: See
docs/llm.txtfor technical details - π Issues: GitHub Issues
- π¬ Discussions: GitHub Discussions