Skip to content

cuppibla/google-study-pack-generator

Repository files navigation

Study Pack Generator

Note: This is a Google-owned project.

Generates a unified Study Pack Markdown note from multiple course tabs using Google ADK + Gemini 2.5 Flash + Playwright.

Opens your browser tabs, captures screenshots, runs visual reasoning, and synthesises everything into a structured Obsidian-compatible note.

Features

  • Visual Reasoning Pipeline — Gemini analyses each tab (classify, prioritise, detect gaps, compare sources, navigate by vision)
  • Multi-tab browser capture — Playwright opens all tabs and captures full-page screenshots + text
  • ADK Agent — SequentialAgent: TabClassifier → ContentExtractor → NoteSynthesizer
  • Obsidian Integration — Saves notes to your vault with YAML frontmatter, tags, and objectives
  • Telegram Bot — Trigger the pipeline via Telegram message
  • Web UI — React chat interface at /chat

Setup

# Install dependencies
uv sync

# Install Playwright browsers
uv run playwright install chromium

# Configure environment
cp .env.example .env   # add your keys

.env required keys:

GOOGLE_API_KEY=...
OBSIDIAN_VAULT_PATH=/path/to/your/vault
TELEGRAM_BOT_TOKEN=...   # optional

Running

# Web server (FastAPI + chat UI at http://localhost:8080/chat)
uv run python main.py

# Telegram bot (standalone)
uv run python scripts/telegram_bot.py

Testing

# Stages 1-4: search + browser capture (no LLM)
python scripts/test_e2e.py

# Full pipeline with LLM
python scripts/test_e2e.py --full

# Visual content tests (images, GIF, video thumbnail)
python scripts/test_e2e.py --visual

# All 5 Gemini visual reasoning features
python scripts/test_e2e.py --vision-reasoning

# FastAPI /run_sse round-trip
python scripts/test_e2e.py --http

Architecture

main.py                     FastAPI server (:8080)
scripts/telegram_bot.py     Telegram bot
src/agent/
  __init__.py               ADK root_agent
  agent.py                  generate_study_pack tool
  pipeline.py               SequentialAgent pipeline
src/tools/
  browser_capture.py        Playwright tab capture + visual navigation
  gemini_vision.py          5 visual reasoning features
  web_search.py             DuckDuckGo search (ddgs)
  markdown_writer.py        Obsidian note writer
frontend/index.html         React chat UI
fixtures/                   HTML fixture pages for testing

Stack

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors