教學內容工作站 · Teaching Content Studio
Turn exams, slides, documents, code repos and audio into narrated teaching videos, slide decks, infographics and localized content — from one self-hostable server, organized per course, with a human review gate over every AI output.
把考卷、講義、文件、程式碼、音檔,一站式變成有旁白的教學影片、簡報 / 圖卡 / 海報與多語在地化內容 — 單一可自架伺服器、以「一門課一工作空間」管理、且每個 AI 產出都有人工審查關卡。
eduStudio is a single, self-hostable Python FastAPI server that helps teachers (especially STEM / engineering) turn raw materials into polished, publishable teaching content — and keeps a human in the loop over the AI. It merges three formerly separate tools into one unified web app and one deployable backend.
Think of it as "NotebookLM for teachers who publish on YouTube" — but you own the server, and nothing ships until you approve it.
| 🎬 Video | 🎨 Visual | 🌐 Localization |
|---|---|---|
| Exam PDF → blackboard-style worked-solution video | Teaching slides (16 themes, audience/tone steering) | Translate / re-dub external videos |
| Slides PDF → page-by-page narrated lecture | Infographic cards & print-grade posters | Meeting / lecture audio → summary |
| Doc / Repo / URL → AI outline → narrated video | Two-stage outline → full deck → PPTX export | Song mp3 → lyric timeline → AI-image MV |
| Subtitles (SRT) + one-click YouTube upload | Per-slide refine + auto chart/diagram | Flashcards (SM-2), writing correction |
- 🛡️ Human review gate — AI output (especially exam answers / numbers) stops at an editable review screen before rendering. The product's core principle: never publish unverified AI numbers. Exam solutions are review-locked by design.
- 🗂️ One course = one workspace — pick a course at the top; every video and visual you generate is automatically filed under it (sources · tasks · products), NotebookLM-style.
- 🎙️ Your own voice — F5-TTS voice cloning lets narration speak in your voice, with automatic fallback to edge-tts / Google TTS.
- 🧩 Gemini 3 powered —
gemini-3.5-flash/gemini-3.1-pro-previewfor text,gemini-3.1-flash-image/gemini-3-pro-imagefor images, fully configurable in-app. - 📤 Publish-ready — PPTX export, YouTube auto-chapters, bilingual subtitle tracks, LaTeX formula rendering, personal-brand footer baked into slides & cards.
- 🔒 Self-hosted & offline-first — your API key, your machine, your data. No third-party SaaS in the loop.
Screenshots are captured from a running
/appinstance. Drop the images underdocs/screenshots/with the filenames below and they'll render here.
The unified /app workstation |
The human review gate |
|---|---|
docs/screenshots/app-home.png |
docs/screenshots/review-gate.png |
| Pick a course, then Video / Visual / Localization | Every AI answer stops here, editable, until you approve |
| Visual composer (infographics & posters) | Cost panel (real per-station usage) |
|---|---|
docs/screenshots/visual.png |
docs/screenshots/usage.png |
One-command try (Docker) — fastest way to kick the tyres. The bundled image already has ffmpeg + CJK fonts, so you don't install anything except Docker itself:
cp .env.example .env # then put your GEMINI_API_KEY in it
cp tts_config.example.json tts_config.json # default edge-tts is fine
docker compose up -d --build # build + start in the backgroundThen open http://localhost:8000/app/. Stop with docker compose down (add -v to
also wipe the jobs volume). For exposing it beyond localhost (token, CORS, reverse proxy
- TLS), follow
docs/DEPLOYMENT.md— never put it on a public port without settingEDUSTUDIO_API_TOKENfirst.
Or run it from source:
# 0. System prerequisites (NOT pip): ffmpeg (+ffprobe) for any render,
# and Noto CJK fonts for correct Chinese glyphs. See "Dependency layers" below.
# 1. Backend (Python 3.12)
pip install -r requirements.txt # core deps — enough to run the server
# add-ons (only if you need them): requirements-optional.txt (PPTX export / STT /
# F5-TTS), requirements-song.txt (SONG MV track), requirements-dev.txt (tests)
export GEMINI_API_KEY=your_key # or set it in the in-app Settings page
# 2. Frontend (the unified /app UI)
cd frontend && npm install && npx vite build --base=/app/ # --base=/app/ is required
cd ..
# 3. Run
uvicorn server.main:app --host 127.0.0.1 --port 8000Then open http://127.0.0.1:8000/app/.
Dependencies are split so you install only what you actually use. requirements.txt
alone is enough to run the server and the main pipelines (video, visual, localization
text) — add a layer only when you want the matching feature.
| Layer | Install | What it adds | Without it |
|---|---|---|---|
| core | pip install -r requirements.txt |
Server + video / visual / localization-text pipelines (Gemini, FastAPI, Pillow, edge-tts, PyMuPDF, matplotlib) | — (always required) |
| optional | pip install -r requirements-optional.txt |
PPTX export (python-pptx), speech-to-text (faster-whisper, auto GPU→CPU), F5-TTS voice cloning, sample-PDF tool, outro QR |
Those specific features fail gracefully; everything else runs |
| song | pip install -r requirements-song.txt |
SONG MV track only — Demucs + WhisperX (heavy, several GB, GPU recommended) | The song/MV track is unavailable; all other tracks fine |
| dev | pip install -r requirements-dev.txt |
Test suite (pytest, httpx) |
Can't run pytest tests/ |
System dependencies (installed outside pip):
- ffmpeg / ffprobe — required for any video render or audio extraction.
apt install ffmpeg·brew install ffmpeg·choco install ffmpeg. - Noto CJK fonts (e.g.
fonts-noto-cjk) — needed for correct Chinese rendering in slides / blackboard. Paths are overridable viaCLAUDE_FONT_PATH/CLAUDE_FALLBACK_FONT_PATH/CLAUDE_MONO_FONT_PATH.
The bundled Dockerfile already installs ffmpeg and the CJK fonts for you.
| Path | What | |
|---|---|---|
/app |
Unified workstation (Video · Visual · Material/Project · Publish · Status) | primary |
/api, /localization, /projects, /jobs |
REST backend (generation, translation, projects, jobs) | |
/docs |
Auto-generated OpenAPI docs | |
/studio, /ui |
Legacy standalone UIs (kept for reference) | legacy |
Python 3.12 · FastAPI · React 19 + Vite · Google Gemini 3 · faster-whisper · F5-TTS · edge-tts · PyMuPDF · python-pptx · matplotlib (LaTeX) · ffmpeg
eduStudio 是一套單一、可自架的 Python FastAPI 伺服器,幫老師(尤其理工 / 工程科)把原始素材變成可發布的教學內容,而且全程人工把關 AI 產出。它把三個原本獨立的工具整合成一個 Web 介面 + 一個可部署後端。
可以想成 「給在 YouTube 上課的老師用的 NotebookLM」 — 但伺服器是你自己的,東西沒按下核准就不會出去。
| 🎬 影片 | 🎨 視覺 | 🌐 在地化 |
|---|---|---|
| 考卷 PDF → 黑板風格逐題解答影片 | 教學簡報(16 種主題、受眾/語氣引導) | 外部影片翻譯 / 重新配音 |
| 簡報 PDF → 逐頁旁白講解影片 | 資訊圖卡 & 印刷級海報 | 會議 / 演講錄音 → 重點摘要 |
| 文件 / Repo / 網址 → AI 大綱 → 講解影片 | 兩階段大綱 → 完整簡報 → PPTX 匯出 | 歌曲 mp3 → 歌詞時間軸 → AI 生圖 MV |
| 字幕(SRT)+ 一鍵上傳 YouTube | 單頁微調 + 自動圖表/架構圖 | 單字卡(SM-2)、寫作批改 |
- 🛡️ 人工審查關卡 — AI 產出(尤其解題答案 / 數字)會停在可編輯的審查頁,核准後才渲染。核心原則:絕不發布未經查證的 AI 數值。考卷解答一律強制審查。
- 🗂️ 一門課=一工作空間 — 右上選課,之後產的每支影片 / 每張圖卡都自動歸到該課(來源 · 任務 · 成品),NotebookLM 式管理。
- 🎙️ 你自己的聲音 — F5-TTS 聲音複製讓旁白用你的聲音念,並自動退回 edge-tts / Google TTS。
- 🧩 Gemini 3 驅動 — 文字用
gemini-3.5-flash/gemini-3.1-pro-preview,圖片用gemini-3.1-flash-image/gemini-3-pro-image,App 內可自由設定。 - 📤 隨時可發布 — PPTX 匯出、YouTube 自動章節、雙語字幕軌、LaTeX 公式渲染、個人品牌頁尾自動帶進簡報與圖卡。
- 🔒 自架、離線優先 — 你的 API key、你的機器、你的資料,中間不經第三方 SaaS。
截圖取自實際跑起來的
/app。把圖檔以下方檔名放進docs/screenshots/即會顯示於此。
統一 /app 工作站 |
人工審查關卡 |
|---|---|
docs/screenshots/app-home.png |
docs/screenshots/review-gate.png |
| 右上選課,再切影片 / 視覺 / 在地化 | 每個 AI 答案都停在這裡、可編輯,核准前不外流 |
| 視覺工作台(圖卡 & 海報) | 成本面板(各站真實用量) |
|---|---|
docs/screenshots/visual.png |
docs/screenshots/usage.png |
一鍵體驗(Docker) — 試水溫最快的路。內附 image 已裝好 ffmpeg + CJK 字型,除了 Docker 本身你什麼都不用裝:
cp .env.example .env # 填入你的 GEMINI_API_KEY
cp tts_config.example.json tts_config.json # 預設 edge-tts 即可
docker compose up -d --build # 建置 + 背景啟動接著打開 http://localhost:8000/app/。停止用 docker compose down(加 -v 連 jobs
volume 一起清)。要暴露到 localhost 以外(token、CORS、反向代理 + TLS)請照
docs/DEPLOYMENT.md — 沒設 EDUSTUDIO_API_TOKEN 前別開公網 port。
或從原始碼跑:
# 0. 系統相依 (非 pip): ffmpeg (+ffprobe) 任何 render 都要、Noto CJK 字型確保中文正常。
# 詳見下方「依賴分層」。
# 1. 後端 (Python 3.12)
pip install -r requirements.txt # 核心依賴 — 裝這個就能跑 server
# 按需加裝: requirements-optional.txt(PPTX 匯出 / 語音轉文字 / F5-TTS)、
# requirements-song.txt(SONG MV 軸)、requirements-dev.txt(跑測試)
export GEMINI_API_KEY=你的金鑰 # 或直接在 App 的「設定」頁填
# 2. 前端 (統一 /app 介面)
cd frontend && npm install && npx vite build --base=/app/ # --base=/app/ 一定要帶
cd ..
# 3. 啟動
uvicorn server.main:app --host 127.0.0.1 --port 8000接著打開 http://127.0.0.1:8000/app/。
依賴刻意拆開,只裝你會用到的。光裝 requirements.txt 就足以跑起 server 與主要 pipeline
(影片、視覺、在地化文字)——要用哪個功能再加裝對應那層即可。
| 分層 | 安裝 | 加了什麼 | 不裝的話 |
|---|---|---|---|
| 核心 core | pip install -r requirements.txt |
Server + 影片 / 視覺 / 在地化文字 pipeline(Gemini、FastAPI、Pillow、edge-tts、PyMuPDF、matplotlib) | —(一定要裝) |
| 選用 optional | pip install -r requirements-optional.txt |
PPTX 匯出(python-pptx)、語音轉文字(faster-whisper,自動 GPU→CPU)、F5-TTS 聲音複製、樣本 PDF 工具、outro QR |
對應功能會優雅報錯,其餘照常 |
| song | pip install -r requirements-song.txt |
只有 SONG MV 軸 — Demucs + WhisperX(重、數 GB、建議 GPU) | song/MV 軸無法用,其他軸不受影響 |
| dev | pip install -r requirements-dev.txt |
測試套件(pytest、httpx) |
無法跑 pytest tests/ |
系統相依(非 pip 安裝):
- ffmpeg / ffprobe — 任何影片 render 或抽音訊必需。
apt install ffmpeg/brew install ffmpeg/choco install ffmpeg。 - Noto CJK 字型(例
fonts-noto-cjk)— 簡報/黑板中文正確顯示所需。路徑可用CLAUDE_FONT_PATH/CLAUDE_FALLBACK_FONT_PATH/CLAUDE_MONO_FONT_PATH覆寫。
內附的 Dockerfile 已幫你裝好 ffmpeg 與 CJK 字型。
eduStudio/
├── core/ 後端核心(影片 pipeline / infocards 視覺 / translation 在地化 / project …)
├── server/ FastAPI routes
├── frontend/ 統一 /app 前端原始碼(React 19 + Vite,自包含建置)
├── web/ 前端建置產物(/app /studio /ui 靜態檔)
├── tests/ 2300+ pytest
└── STATUS.yaml 專案現況