🎓 eduStudio

教學內容工作站 · Teaching Content Studio

Turn exams, slides, documents, code repos and audio into narrated teaching videos, slide decks, infographics and localized content — from one self-hostable server, organized per course, with a human review gate over every AI output.

把考卷、講義、文件、程式碼、音檔，一站式變成有旁白的教學影片、簡報 / 圖卡 / 海報與多語在地化內容 — 單一可自架伺服器、以「一門課一工作空間」管理、且每個 AI 產出都有人工審查關卡。

English · 繁體中文

🇬🇧 English

What is eduStudio?

eduStudio is a single, self-hostable Python FastAPI server that helps teachers (especially STEM / engineering) turn raw materials into polished, publishable teaching content — and keeps a human in the loop over the AI. It merges three formerly separate tools into one unified web app and one deployable backend.

Think of it as "NotebookLM for teachers who publish on YouTube" — but you own the server, and nothing ships until you approve it.

Three pillars

🎬 Video	🎨 Visual	🌐 Localization
Exam PDF → blackboard-style worked-solution video	Teaching slides (16 themes, audience/tone steering)	Translate / re-dub external videos
Slides PDF → page-by-page narrated lecture	Infographic cards & print-grade posters	Meeting / lecture audio → summary
Doc / Repo / URL → AI outline → narrated video	Two-stage outline → full deck → PPTX export	Song mp3 → lyric timeline → AI-image MV
Subtitles (SRT) + one-click YouTube upload	Per-slide refine + auto chart/diagram	Flashcards (SM-2), writing correction

Highlights

🛡️ Human review gate — AI output (especially exam answers / numbers) stops at an editable review screen before rendering. The product's core principle: never publish unverified AI numbers. Exam solutions are review-locked by design.
🗂️ One course = one workspace — pick a course at the top; every video and visual you generate is automatically filed under it (sources · tasks · products), NotebookLM-style.
🎙️ Your own voice — F5-TTS voice cloning lets narration speak in your voice, with automatic fallback to edge-tts / Google TTS.
🧩 Gemini 3 powered — gemini-3.5-flash / gemini-3.1-pro-preview for text, gemini-3.1-flash-image / gemini-3-pro-image for images, fully configurable in-app.
📤 Publish-ready — PPTX export, YouTube auto-chapters, bilingual subtitle tracks, LaTeX formula rendering, personal-brand footer baked into slides & cards.
🔒 Self-hosted & offline-first — your API key, your machine, your data. No third-party SaaS in the loop.

Screenshots

Screenshots are captured from a running /app instance. Drop the images under docs/screenshots/ with the filenames below and they'll render here.

The unified `/app` workstation	The human review gate
`docs/screenshots/app-home.png`	`docs/screenshots/review-gate.png`
Pick a course, then Video / Visual / Localization	Every AI answer stops here, editable, until you approve

Visual composer (infographics & posters)	Cost panel (real per-station usage)
`docs/screenshots/visual.png`	`docs/screenshots/usage.png`

Quick start

One-command try (Docker) — fastest way to kick the tyres. The bundled image already has ffmpeg + CJK fonts, so you don't install anything except Docker itself:

cp .env.example .env          # then put your GEMINI_API_KEY in it
cp tts_config.example.json tts_config.json   # default edge-tts is fine
docker compose up -d --build  # build + start in the background

Then open http://localhost:8000/app/. Stop with docker compose down (add -v to also wipe the jobs volume). For exposing it beyond localhost (token, CORS, reverse proxy

TLS), follow docs/DEPLOYMENT.md — never put it on a public port without setting EDUSTUDIO_API_TOKEN first.

Or run it from source:

# 0. System prerequisites (NOT pip): ffmpeg (+ffprobe) for any render,
#    and Noto CJK fonts for correct Chinese glyphs. See "Dependency layers" below.

# 1. Backend (Python 3.12)
pip install -r requirements.txt          # core deps — enough to run the server
#   add-ons (only if you need them): requirements-optional.txt (PPTX export / STT /
#   F5-TTS), requirements-song.txt (SONG MV track), requirements-dev.txt (tests)
export GEMINI_API_KEY=your_key           # or set it in the in-app Settings page

# 2. Frontend (the unified /app UI)
cd frontend && npm install && npx vite build --base=/app/   # --base=/app/ is required
cd ..

# 3. Run
uvicorn server.main:app --host 127.0.0.1 --port 8000

Then open http://127.0.0.1:8000/app/.

Dependency layers

Dependencies are split so you install only what you actually use. requirements.txt alone is enough to run the server and the main pipelines (video, visual, localization text) — add a layer only when you want the matching feature.

Layer	Install	What it adds	Without it
core	`pip install -r requirements.txt`	Server + video / visual / localization-text pipelines (Gemini, FastAPI, Pillow, edge-tts, PyMuPDF, matplotlib)	— (always required)
optional	`pip install -r requirements-optional.txt`	PPTX export (`python-pptx`), speech-to-text (`faster-whisper`, auto GPU→CPU), F5-TTS voice cloning, sample-PDF tool, outro QR	Those specific features fail gracefully; everything else runs
song	`pip install -r requirements-song.txt`	SONG MV track only — Demucs + WhisperX (heavy, several GB, GPU recommended)	The song/MV track is unavailable; all other tracks fine
dev	`pip install -r requirements-dev.txt`	Test suite (`pytest`, `httpx`)	Can't run `pytest tests/`

System dependencies (installed outside pip):

ffmpeg / ffprobe — required for any video render or audio extraction. apt install ffmpeg · brew install ffmpeg · choco install ffmpeg.
Noto CJK fonts (e.g. fonts-noto-cjk) — needed for correct Chinese rendering in slides / blackboard. Paths are overridable via CLAUDE_FONT_PATH / CLAUDE_FALLBACK_FONT_PATH / CLAUDE_MONO_FONT_PATH.

The bundled Dockerfile already installs ffmpeg and the CJK fonts for you.

Interfaces

Path	What
`/app`	Unified workstation (Video · Visual · Material/Project · Publish · Status)	primary
`/api`, `/localization`, `/projects`, `/jobs`	REST backend (generation, translation, projects, jobs)
`/docs`	Auto-generated OpenAPI docs
`/studio`, `/ui`	Legacy standalone UIs (kept for reference)	legacy

Tech stack

Python 3.12 · FastAPI · React 19 + Vite · Google Gemini 3 · faster-whisper · F5-TTS · edge-tts · PyMuPDF · python-pptx · matplotlib (LaTeX) · ffmpeg

🇹🇼 繁體中文

eduStudio 是什麼？

eduStudio 是一套單一、可自架的 Python FastAPI 伺服器，幫老師（尤其理工 / 工程科）把原始素材變成可發布的教學內容，而且全程人工把關 AI 產出。它把三個原本獨立的工具整合成一個 Web 介面 + 一個可部署後端。

可以想成 「給在 YouTube 上課的老師用的 NotebookLM」 — 但伺服器是你自己的，東西沒按下核准就不會出去。

三大支柱

🎬 影片	🎨 視覺	🌐 在地化
考卷 PDF → 黑板風格逐題解答影片	教學簡報（16 種主題、受眾/語氣引導）	外部影片翻譯 / 重新配音
簡報 PDF → 逐頁旁白講解影片	資訊圖卡 & 印刷級海報	會議 / 演講錄音 → 重點摘要
文件 / Repo / 網址 → AI 大綱 → 講解影片	兩階段大綱 → 完整簡報 → PPTX 匯出	歌曲 mp3 → 歌詞時間軸 → AI 生圖 MV
字幕（SRT）+ 一鍵上傳 YouTube	單頁微調 + 自動圖表/架構圖	單字卡（SM-2）、寫作批改

特色

🛡️ 人工審查關卡 — AI 產出（尤其解題答案 / 數字）會停在可編輯的審查頁，核准後才渲染。核心原則：絕不發布未經查證的 AI 數值。考卷解答一律強制審查。
🗂️ 一門課＝一工作空間 — 右上選課，之後產的每支影片 / 每張圖卡都自動歸到該課（來源 · 任務 · 成品），NotebookLM 式管理。
🎙️ 你自己的聲音 — F5-TTS 聲音複製讓旁白用你的聲音念，並自動退回 edge-tts / Google TTS。
🧩 Gemini 3 驅動 — 文字用 gemini-3.5-flash / gemini-3.1-pro-preview，圖片用 gemini-3.1-flash-image / gemini-3-pro-image，App 內可自由設定。
📤 隨時可發布 — PPTX 匯出、YouTube 自動章節、雙語字幕軌、LaTeX 公式渲染、個人品牌頁尾自動帶進簡報與圖卡。
🔒 自架、離線優先 — 你的 API key、你的機器、你的資料，中間不經第三方 SaaS。

截圖

截圖取自實際跑起來的 /app。把圖檔以下方檔名放進 docs/screenshots/ 即會顯示於此。

統一 `/app` 工作站	人工審查關卡
`docs/screenshots/app-home.png`	`docs/screenshots/review-gate.png`
右上選課，再切影片 / 視覺 / 在地化	每個 AI 答案都停在這裡、可編輯，核准前不外流

視覺工作台（圖卡 & 海報）	成本面板（各站真實用量）
`docs/screenshots/visual.png`	`docs/screenshots/usage.png`

快速開始

一鍵體驗（Docker） — 試水溫最快的路。內附 image 已裝好 ffmpeg + CJK 字型，除了 Docker 本身你什麼都不用裝：

cp .env.example .env          # 填入你的 GEMINI_API_KEY
cp tts_config.example.json tts_config.json   # 預設 edge-tts 即可
docker compose up -d --build  # 建置 + 背景啟動

接著打開 http://localhost:8000/app/。停止用 docker compose down（加 -v 連 jobs volume 一起清）。要暴露到 localhost 以外（token、CORS、反向代理 + TLS）請照 docs/DEPLOYMENT.md — 沒設 EDUSTUDIO_API_TOKEN 前別開公網 port。

或從原始碼跑：

# 0. 系統相依 (非 pip): ffmpeg (+ffprobe) 任何 render 都要、Noto CJK 字型確保中文正常。
#    詳見下方「依賴分層」。

# 1. 後端 (Python 3.12)
pip install -r requirements.txt          # 核心依賴 — 裝這個就能跑 server
#   按需加裝: requirements-optional.txt(PPTX 匯出 / 語音轉文字 / F5-TTS)、
#   requirements-song.txt(SONG MV 軸)、requirements-dev.txt(跑測試)
export GEMINI_API_KEY=你的金鑰            # 或直接在 App 的「設定」頁填

# 2. 前端 (統一 /app 介面)
cd frontend && npm install && npx vite build --base=/app/   # --base=/app/ 一定要帶
cd ..

# 3. 啟動
uvicorn server.main:app --host 127.0.0.1 --port 8000

接著打開 http://127.0.0.1:8000/app/。

依賴分層

依賴刻意拆開，只裝你會用到的。光裝 requirements.txt 就足以跑起 server 與主要 pipeline （影片、視覺、在地化文字）——要用哪個功能再加裝對應那層即可。

分層	安裝	加了什麼	不裝的話
核心 core	`pip install -r requirements.txt`	Server + 影片 / 視覺 / 在地化文字 pipeline（Gemini、FastAPI、Pillow、edge-tts、PyMuPDF、matplotlib）	—（一定要裝）
選用 optional	`pip install -r requirements-optional.txt`	PPTX 匯出（`python-pptx`）、語音轉文字（`faster-whisper`，自動 GPU→CPU）、F5-TTS 聲音複製、樣本 PDF 工具、outro QR	對應功能會優雅報錯，其餘照常
song	`pip install -r requirements-song.txt`	只有 SONG MV 軸 — Demucs + WhisperX（重、數 GB、建議 GPU）	song/MV 軸無法用，其他軸不受影響
dev	`pip install -r requirements-dev.txt`	測試套件（`pytest`、`httpx`）	無法跑 `pytest tests/`

系統相依（非 pip 安裝）：

ffmpeg / ffprobe — 任何影片 render 或抽音訊必需。apt install ffmpeg／brew install ffmpeg／choco install ffmpeg。
Noto CJK 字型（例 fonts-noto-cjk）— 簡報／黑板中文正確顯示所需。路徑可用 CLAUDE_FONT_PATH／CLAUDE_FALLBACK_FONT_PATH／CLAUDE_MONO_FONT_PATH 覆寫。

內附的 Dockerfile 已幫你裝好 ffmpeg 與 CJK 字型。

專案結構

eduStudio/
├── core/          後端核心(影片 pipeline / infocards 視覺 / translation 在地化 / project …)
├── server/        FastAPI routes
├── frontend/      統一 /app 前端原始碼(React 19 + Vite，自包含建置)
├── web/           前端建置產物(/app /studio /ui 靜態檔)
├── tests/         2300+ pytest
└── STATUS.yaml    專案現況

作者 Author · 劉瑞弘 Juihung Liu — 國立勤益科技大學智慧自動化工程系副教授 · DOF Lab

三個前身專案（autoSolver / infoCard / translateGemma）已整合於此，並保留原 repo 供細項功能參考。

Name		Name	Last commit message	Last commit date
Latest commit History 452 Commits
.claude		.claude
.github		.github
assets/icon_library		assets/icon_library
core		core
deploy		deploy
docs		docs
exams		exams
frontend		frontend
pdfs		pdfs
photos		photos
prompts		prompts
scripts		scripts
server		server
tests		tests
tools		tools
voices		voices
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
HANDOFF.md		HANDOFF.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
STATUS.yaml		STATUS.yaml
TODO.md		TODO.md
app.py		app.py
batch.py		batch.py
claude.md		claude.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
make_sample_pdf.py		make_sample_pdf.py
pipeline.py		pipeline.py
pipeline_config.json		pipeline_config.json
pronunciation.json		pronunciation.json
publish.py		publish.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements-optional.txt		requirements-optional.txt
requirements-song.txt		requirements-song.txt
requirements.txt		requirements.txt
sample_exam.pdf		sample_exam.pdf
slide_ingest.py		slide_ingest.py
solve.py		solve.py
tts_backend.py		tts_backend.py
tts_config.example.json		tts_config.example.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎓 eduStudio

🇬🇧 English

What is eduStudio?

Three pillars

Highlights

Screenshots

Quick start

Dependency layers

Interfaces

Tech stack

🇹🇼 繁體中文

eduStudio 是什麼？

三大支柱

特色

截圖

快速開始

依賴分層

專案結構

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎓 eduStudio

🇬🇧 English

What is eduStudio?

Three pillars

Highlights

Screenshots

Quick start

Dependency layers

Interfaces

Tech stack

🇹🇼 繁體中文

eduStudio 是什麼？

三大支柱

特色

截圖

快速開始

依賴分層

專案結構

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages