Skip to content

dofliu/eduStudio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

452 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎓 eduStudio

教學內容工作站 · Teaching Content Studio

Turn exams, slides, documents, code repos and audio into narrated teaching videos, slide decks, infographics and localized content — from one self-hostable server, organized per course, with a human review gate over every AI output.

把考卷、講義、文件、程式碼、音檔,一站式變成有旁白的教學影片簡報 / 圖卡 / 海報多語在地化內容 — 單一可自架伺服器、以「一門課一工作空間」管理、且每個 AI 產出都有人工審查關卡。

Python FastAPI React Gemini tests Status License: MIT

English · 繁體中文


🇬🇧 English

What is eduStudio?

eduStudio is a single, self-hostable Python FastAPI server that helps teachers (especially STEM / engineering) turn raw materials into polished, publishable teaching content — and keeps a human in the loop over the AI. It merges three formerly separate tools into one unified web app and one deployable backend.

Think of it as "NotebookLM for teachers who publish on YouTube" — but you own the server, and nothing ships until you approve it.

Three pillars

🎬 Video 🎨 Visual 🌐 Localization
Exam PDF → blackboard-style worked-solution video Teaching slides (16 themes, audience/tone steering) Translate / re-dub external videos
Slides PDF → page-by-page narrated lecture Infographic cards & print-grade posters Meeting / lecture audio → summary
Doc / Repo / URL → AI outline → narrated video Two-stage outline → full deck → PPTX export Song mp3 → lyric timeline → AI-image MV
Subtitles (SRT) + one-click YouTube upload Per-slide refine + auto chart/diagram Flashcards (SM-2), writing correction

Highlights

  • 🛡️ Human review gate — AI output (especially exam answers / numbers) stops at an editable review screen before rendering. The product's core principle: never publish unverified AI numbers. Exam solutions are review-locked by design.
  • 🗂️ One course = one workspace — pick a course at the top; every video and visual you generate is automatically filed under it (sources · tasks · products), NotebookLM-style.
  • 🎙️ Your own voice — F5-TTS voice cloning lets narration speak in your voice, with automatic fallback to edge-tts / Google TTS.
  • 🧩 Gemini 3 poweredgemini-3.5-flash / gemini-3.1-pro-preview for text, gemini-3.1-flash-image / gemini-3-pro-image for images, fully configurable in-app.
  • 📤 Publish-ready — PPTX export, YouTube auto-chapters, bilingual subtitle tracks, LaTeX formula rendering, personal-brand footer baked into slides & cards.
  • 🔒 Self-hosted & offline-first — your API key, your machine, your data. No third-party SaaS in the loop.

Screenshots

Screenshots are captured from a running /app instance. Drop the images under docs/screenshots/ with the filenames below and they'll render here.

The unified /app workstation The human review gate
docs/screenshots/app-home.png docs/screenshots/review-gate.png
Pick a course, then Video / Visual / Localization Every AI answer stops here, editable, until you approve
Visual composer (infographics & posters) Cost panel (real per-station usage)
docs/screenshots/visual.png docs/screenshots/usage.png

Quick start

One-command try (Docker) — fastest way to kick the tyres. The bundled image already has ffmpeg + CJK fonts, so you don't install anything except Docker itself:

cp .env.example .env          # then put your GEMINI_API_KEY in it
cp tts_config.example.json tts_config.json   # default edge-tts is fine
docker compose up -d --build  # build + start in the background

Then open http://localhost:8000/app/. Stop with docker compose down (add -v to also wipe the jobs volume). For exposing it beyond localhost (token, CORS, reverse proxy

  • TLS), follow docs/DEPLOYMENT.md — never put it on a public port without setting EDUSTUDIO_API_TOKEN first.

Or run it from source:

# 0. System prerequisites (NOT pip): ffmpeg (+ffprobe) for any render,
#    and Noto CJK fonts for correct Chinese glyphs. See "Dependency layers" below.

# 1. Backend (Python 3.12)
pip install -r requirements.txt          # core deps — enough to run the server
#   add-ons (only if you need them): requirements-optional.txt (PPTX export / STT /
#   F5-TTS), requirements-song.txt (SONG MV track), requirements-dev.txt (tests)
export GEMINI_API_KEY=your_key           # or set it in the in-app Settings page

# 2. Frontend (the unified /app UI)
cd frontend && npm install && npx vite build --base=/app/   # --base=/app/ is required
cd ..

# 3. Run
uvicorn server.main:app --host 127.0.0.1 --port 8000

Then open http://127.0.0.1:8000/app/.

Dependency layers

Dependencies are split so you install only what you actually use. requirements.txt alone is enough to run the server and the main pipelines (video, visual, localization text) — add a layer only when you want the matching feature.

Layer Install What it adds Without it
core pip install -r requirements.txt Server + video / visual / localization-text pipelines (Gemini, FastAPI, Pillow, edge-tts, PyMuPDF, matplotlib) — (always required)
optional pip install -r requirements-optional.txt PPTX export (python-pptx), speech-to-text (faster-whisper, auto GPU→CPU), F5-TTS voice cloning, sample-PDF tool, outro QR Those specific features fail gracefully; everything else runs
song pip install -r requirements-song.txt SONG MV track only — Demucs + WhisperX (heavy, several GB, GPU recommended) The song/MV track is unavailable; all other tracks fine
dev pip install -r requirements-dev.txt Test suite (pytest, httpx) Can't run pytest tests/

System dependencies (installed outside pip):

  • ffmpeg / ffproberequired for any video render or audio extraction. apt install ffmpeg · brew install ffmpeg · choco install ffmpeg.
  • Noto CJK fonts (e.g. fonts-noto-cjk) — needed for correct Chinese rendering in slides / blackboard. Paths are overridable via CLAUDE_FONT_PATH / CLAUDE_FALLBACK_FONT_PATH / CLAUDE_MONO_FONT_PATH.

The bundled Dockerfile already installs ffmpeg and the CJK fonts for you.

Interfaces

Path What
/app Unified workstation (Video · Visual · Material/Project · Publish · Status) primary
/api, /localization, /projects, /jobs REST backend (generation, translation, projects, jobs)
/docs Auto-generated OpenAPI docs
/studio, /ui Legacy standalone UIs (kept for reference) legacy

Tech stack

Python 3.12 · FastAPI · React 19 + Vite · Google Gemini 3 · faster-whisper · F5-TTS · edge-tts · PyMuPDF · python-pptx · matplotlib (LaTeX) · ffmpeg


🇹🇼 繁體中文

eduStudio 是什麼?

eduStudio 是一套單一、可自架的 Python FastAPI 伺服器,幫老師(尤其理工 / 工程科)把原始素材變成可發布的教學內容,而且全程人工把關 AI 產出。它把三個原本獨立的工具整合成一個 Web 介面 + 一個可部署後端

可以想成 「給在 YouTube 上課的老師用的 NotebookLM」 — 但伺服器是你自己的,東西沒按下核准就不會出去。

三大支柱

🎬 影片 🎨 視覺 🌐 在地化
考卷 PDF → 黑板風格逐題解答影片 教學簡報(16 種主題、受眾/語氣引導) 外部影片翻譯 / 重新配音
簡報 PDF → 逐頁旁白講解影片 資訊圖卡 & 印刷級海報 會議 / 演講錄音 → 重點摘要
文件 / Repo / 網址 → AI 大綱 → 講解影片 兩階段大綱 → 完整簡報 → PPTX 匯出 歌曲 mp3 → 歌詞時間軸 → AI 生圖 MV
字幕(SRT)+ 一鍵上傳 YouTube 單頁微調 + 自動圖表/架構圖 單字卡(SM-2)、寫作批改

特色

  • 🛡️ 人工審查關卡 — AI 產出(尤其解題答案 / 數字)會停在可編輯的審查頁,核准後才渲染。核心原則:絕不發布未經查證的 AI 數值。考卷解答一律強制審查。
  • 🗂️ 一門課=一工作空間 — 右上選課,之後產的每支影片 / 每張圖卡都自動歸到該課(來源 · 任務 · 成品),NotebookLM 式管理。
  • 🎙️ 你自己的聲音 — F5-TTS 聲音複製讓旁白用你的聲音念,並自動退回 edge-tts / Google TTS。
  • 🧩 Gemini 3 驅動 — 文字用 gemini-3.5-flash / gemini-3.1-pro-preview,圖片用 gemini-3.1-flash-image / gemini-3-pro-image,App 內可自由設定。
  • 📤 隨時可發布 — PPTX 匯出、YouTube 自動章節、雙語字幕軌、LaTeX 公式渲染、個人品牌頁尾自動帶進簡報與圖卡。
  • 🔒 自架、離線優先 — 你的 API key、你的機器、你的資料,中間不經第三方 SaaS。

截圖

截圖取自實際跑起來的 /app。把圖檔以下方檔名放進 docs/screenshots/ 即會顯示於此。

統一 /app 工作站 人工審查關卡
docs/screenshots/app-home.png docs/screenshots/review-gate.png
右上選課,再切影片 / 視覺 / 在地化 每個 AI 答案都停在這裡、可編輯,核准前不外流
視覺工作台(圖卡 & 海報) 成本面板(各站真實用量)
docs/screenshots/visual.png docs/screenshots/usage.png

快速開始

一鍵體驗(Docker) — 試水溫最快的路。內附 image 已裝好 ffmpeg + CJK 字型,除了 Docker 本身你什麼都不用裝:

cp .env.example .env          # 填入你的 GEMINI_API_KEY
cp tts_config.example.json tts_config.json   # 預設 edge-tts 即可
docker compose up -d --build  # 建置 + 背景啟動

接著打開 http://localhost:8000/app/。停止用 docker compose down(加 -vjobs volume 一起清)。要暴露到 localhost 以外(token、CORS、反向代理 + TLS)請照 docs/DEPLOYMENT.md沒設 EDUSTUDIO_API_TOKEN 前別開公網 port

或從原始碼跑:

# 0. 系統相依 (非 pip): ffmpeg (+ffprobe) 任何 render 都要、Noto CJK 字型確保中文正常。
#    詳見下方「依賴分層」。

# 1. 後端 (Python 3.12)
pip install -r requirements.txt          # 核心依賴 — 裝這個就能跑 server
#   按需加裝: requirements-optional.txt(PPTX 匯出 / 語音轉文字 / F5-TTS)、
#   requirements-song.txt(SONG MV 軸)、requirements-dev.txt(跑測試)
export GEMINI_API_KEY=你的金鑰            # 或直接在 App 的「設定」頁填

# 2. 前端 (統一 /app 介面)
cd frontend && npm install && npx vite build --base=/app/   # --base=/app/ 一定要帶
cd ..

# 3. 啟動
uvicorn server.main:app --host 127.0.0.1 --port 8000

接著打開 http://127.0.0.1:8000/app/

依賴分層

依賴刻意拆開,只裝你會用到的。光裝 requirements.txt 就足以跑起 server 與主要 pipeline (影片、視覺、在地化文字)——要用哪個功能再加裝對應那層即可。

分層 安裝 加了什麼 不裝的話
核心 core pip install -r requirements.txt Server + 影片 / 視覺 / 在地化文字 pipeline(Gemini、FastAPI、Pillow、edge-tts、PyMuPDF、matplotlib) —(一定要裝)
選用 optional pip install -r requirements-optional.txt PPTX 匯出(python-pptx)、語音轉文字(faster-whisper,自動 GPU→CPU)、F5-TTS 聲音複製、樣本 PDF 工具、outro QR 對應功能會優雅報錯,其餘照常
song pip install -r requirements-song.txt 只有 SONG MV 軸 — Demucs + WhisperX(重、數 GB、建議 GPU) song/MV 軸無法用,其他軸不受影響
dev pip install -r requirements-dev.txt 測試套件(pytesthttpx 無法跑 pytest tests/

系統相依(非 pip 安裝):

  • ffmpeg / ffprobe — 任何影片 render 或抽音訊必需apt install ffmpegbrew install ffmpegchoco install ffmpeg
  • Noto CJK 字型(例 fonts-noto-cjk)— 簡報/黑板中文正確顯示所需。路徑可用 CLAUDE_FONT_PATHCLAUDE_FALLBACK_FONT_PATHCLAUDE_MONO_FONT_PATH 覆寫。

內附的 Dockerfile 已幫你裝好 ffmpeg 與 CJK 字型。

專案結構

eduStudio/
├── core/          後端核心(影片 pipeline / infocards 視覺 / translation 在地化 / project …)
├── server/        FastAPI routes
├── frontend/      統一 /app 前端原始碼(React 19 + Vite,自包含建置)
├── web/           前端建置產物(/app /studio /ui 靜態檔)
├── tests/         2300+ pytest
└── STATUS.yaml    專案現況

作者 Author · 劉瑞弘 Juihung Liu — 國立勤益科技大學 智慧自動化工程系 副教授 · DOF Lab

三個前身專案(autoSolver / infoCard / translateGemma)已整合於此,並保留原 repo 供細項功能參考。

授權 License · MIT — © 2026 劉瑞弘 Juihung Liu

About

教學內容工作站 · Self-hostable AI server that turns exams / slides / docs / repos into narrated teaching videos, slide decks, infographics & localized content — organized per course, with a human review gate. NotebookLM for teachers who publish on YouTube.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors