Skip to content

Latest commit

 

History

History
199 lines (172 loc) · 6.73 KB

File metadata and controls

199 lines (172 loc) · 6.73 KB

MangaQA — Progress Tracker

Update this file as you complete each phase. Check off subtasks as they're done.


Phase 1: Project Setup & Infrastructure

Status: COMPLETED Started: 2026-03-19 Completed: 2026-03-19

  • 1.1 Repository Setup
    • Initialize git repo
    • Create monorepo structure (/frontend + /backend)
    • Add .gitignore
    • Add README.md
  • 1.2 Backend Scaffold
    • Python virtual environment
    • Install core dependencies
    • FastAPI app + health check
    • Environment config (.env)
    • Alembic setup
  • 1.3 Database Setup (Supabase)
    • Create Supabase project
    • Enable pgvector
    • Design + create schema
    • Run initial migration
  • 1.4 Frontend Scaffold
    • Vite + React + TypeScript
    • Install dependencies
    • Routing skeleton
    • API client utility

Phase 2: Data Upload & Storage Pipeline

Status: COMPLETED Started: 2026-03-20 Completed: 2026-03-20

  • 2.1 Upload API
    • POST /api/projects
    • GET /api/projects
    • GET /api/projects/{id}
    • DELETE /api/projects/{id}
    • POST /api/projects/{id}/chapters
    • GET /api/projects/{id}/chapters
  • 2.2 JSON Validation & Parsing
    • Pydantic upload models
    • Validation + error messages
    • Store to DB
  • 2.3 Frontend — Upload Flow
    • Dashboard page
    • Create project form
    • Project view with chapters
    • JSON upload with preview
    • Success/error feedback

Phase 3: Embedding Generation

Status: COMPLETED Started: 2026-03-21 Completed: 2026-03-21

  • 3.1 Embedding Integration
    • HuggingFace Inference API client (replaced OpenRouter — no free embedding models)
    • Model: BAAI/bge-small-en-v1.5 (384-dim, free tier)
    • Embedding generation function with batch encoding + normalization
    • Alembic migration: vector dimension 1536 → 384
  • 3.2 OpenRouter LLM Client
    • Rate-limited client (sliding-window, 20 req/min)
    • Retry logic with exponential backoff on 429
    • JSON response parsing with markdown fence stripping
  • 3.3 Similarity Search
    • pgvector cosine distance queries
    • Find similar-but-different line pairs
    • Speaker centroid computation + outlier detection
    • End-to-end test

Phase 4: QA Engine — The 4 Checkers

Status: COMPLETED Started: 2026-03-21 Completed: 2026-03-21

  • 4.1 Untranslated Text Checker
    • Unicode range detection (CJK, Hiragana, Katakana)
    • Japanese ratio classification (critical ≥80%, warning <80%)
    • Store results with context JSONB
  • 4.2 Consistency Checker
    • pgvector similarity search for near-duplicate lines
    • LLM batch validation (10 pairs per call)
    • Store confirmed inconsistencies
  • 4.3 Character Voice Checker
    • Group lines by speaker (min 5 lines)
    • Compute centroid embeddings (numpy mean + normalize)
    • Detect outliers via cosine distance
    • LLM voice validation with typical examples
    • Store results
  • 4.4 Tone Checker
    • Group by page, merge adjacent short scenes
    • LLM tone analysis per scene block
    • Parse findings with severity classification
    • Store results

Phase 5: Job Queue & Analysis Orchestration

Status: COMPLETED Started: 2026-03-21 Completed: 2026-03-21

  • 5.1 Job Queue
    • POST /api/projects/{id}/jobs/analyze (202 Accepted)
    • Job record creation with duplicate check
    • FastAPI BackgroundTasks worker
    • Sequential checker execution with per-checker commits
    • Status transitions (pending → running → completed/failed)
    • In-memory progress tracking
  • 5.2 Job Status API
    • GET /api/projects/{id}/jobs
    • GET /api/projects/{id}/jobs/{job_id} with live progress
    • Frontend polling (10s interval)

Phase 6: QA Report & Frontend

Status: COMPLETED Started: 2026-03-21 Completed: 2026-03-21

  • 6.1 Report API
    • GET /api/projects/{id}/report (latest completed job)
    • Summary stats (by severity, by checker)
    • Dialogue line context via selectinload
  • 6.2 Frontend — Report View
    • Summary cards (total, critical, warning, info)
    • Checker breakdown cards
    • Filter dropdowns (checker type, severity) — client-side
    • Expandable issue cards with dialogue context + suggestions
  • 6.3 Frontend — Job Status
    • "Run Analysis" button with live progress display
    • Job status badges (pending/running/completed/failed)
    • Auto-navigate to report on completion
  • 6.4 Authentication
    • JWT auth (python-jose + bcrypt)
    • Login page + protected routes
    • Axios interceptors (auto-attach token, redirect on 401)
    • Logout button in nav
    • User creation script (credentials from .env)

Phase 7: Testing & Polish

Status: SKIPPED Reason: MVP scope — skipped formal tests and UI polish for now

  • 7.1 Synthetic Test Data — 2 sample chapters exist (sample_chapter.json, sample_chapter_2.json)
  • 7.2 Backend Tests — skipped
  • 7.3 Frontend Polish — skipped

Phase 8: Deployment

Status: COMPLETED Started: 2026-03-22 Completed: 2026-03-22

  • 8.1 Supabase
    • Production schema verified (all migrations applied)
    • Session Pooler (pgbouncer) connection configured
  • 8.2 Backend → Render
    • backend/Dockerfile created
    • render.yaml with all env vars
    • .dockerignore for clean builds
    • Deploy + health check (manual step)
    • Test endpoints (manual step)
  • 8.3 Frontend → Vercel
    • Root directory: frontend, framework: Vite, output: dist
    • VITE_API_BASE_URL env var documented
    • Deploy + verify (manual step)
    • Test full flow (manual step)
  • 8.4 Post-Deployment
    • CORS configuration via CORS_ORIGINS env var
    • Deployment notes in README

Notes

  • Supabase direct connection requires IPv6; using Session Pooler (pgbouncer) instead — requires statement_cache_size=0 in asyncpg connect_args
  • Alembic configured for async migrations with pgbouncer compatibility
  • pgvector extension enabled via initial Alembic migration
  • OpenRouter has no free embedding models — using HuggingFace Inference API (BAAI/bge-small-en-v1.5) instead
  • Avoided sentence-transformers local install (~500MB PyTorch) — would break Render free tier deployment
  • Supabase has 8s default statement timeout on DDL — use SET statement_timeout = '0' in migrations or run DDL via SQL Editor
  • OpenRouter free tier: 20 req/min, 200 req/day — ~28 LLM calls per chapter analysis (~84s)
  • Job progress tracked in-memory (not DB) to avoid pgbouncer transaction/lock issues