RAG-powered academic paper Q&A system with multi-paper comparison, relationship graph visualization, and prompt injection vulnerability analysis
Paperprobeλ νμ λ Όλ¬Έ PDFλ₯Ό μ λ‘λνκ³ , λ Όλ¬Έ λ΄μ© κΈ°λ°μ μ§μμλ΅ Β· λ Όλ¬Έ κ° λΉκ΅ λΆμ Β· μ μ¬λ κ·Έλν μκ°ν Β· 보μ μ·¨μ½μ λΆμμ μ 곡νλ RAG(Retrieval-Augmented Generation) μμ€ν μ λλ€.
- π₯ PDF Upload & Parsing β λ Όλ¬Έ μ λ‘λ μ μλ ν μ€νΈ μΆμΆ, μ²νΉ, μλ² λ© μ μ₯
- π¬ Paper Q&A β λ Όλ¬Έ λ΄μ© κΈ°λ° μ§μμλ΅ (μ€νΈλ¦¬λ° μλ΅)
- βοΈ Multi-paper Comparison β μ΅λ 5κ° λ Όλ¬Έμ μ°κ΅¬ λͺ©μ / λ°©λ²λ‘ / κ²°κ³Ό / νκ³ λΉκ΅
- πΈοΈ Relationship Graph β λ Όλ¬Έ κ° μ½μ¬μΈ μ μ¬λ κΈ°λ° μΈν°λν°λΈ λ€νΈμν¬ κ·Έλν (D3.js)
- π Prompt Injection Detection β μλ§¨ν± μ μ¬λ κΈ°λ° μ μ± μ§μλ¬Έ νμ§ λ° λ°©μ΄ (ν/μ μ§μ)
- ποΈ Paper Management β λ Όλ¬Έ λͺ©λ‘ μ‘°ν λ° μμ
βββββββββββββββββββ ββββββββββββββββββββββββββββββββββββββββββββ
β Next.js β HTTP β FastAPI Backend β
β Frontend βββββββββΊβ β
β β β βββββββββββββββ ββββββββββββββββββββ β
β - Upload/Delete β β β PDF Parser β β Embedder β β
β - Chat (Q&A) β β β (pdfplumber)β β(all-MiniLM-L6-v2)β β
β - Compare β β ββββββββ¬βββββββ ββββββββββ¬ββββββββββ β
β - Graph (D3.js) β β β β β
β - Security β β ββββββββΌβββββββββββββββββββΌβββββββββββ β
βββββββββββββββββββ β β ChromaDB β β
β β (per-paper collection) β β
β ββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββ ββββββββββββββββββββ β
β β SQLite β β LLM Backend β β
β β (metadata) β β Ollama / Gemini β β
β ββββββββββββββββ ββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββ
| Layer | Technology |
|---|---|
| Frontend | Next.js, TypeScript, Tailwind CSS, D3.js |
| Backend | FastAPI, Python 3.10 |
| Vector DB | ChromaDB 0.4.24 |
| Metadata DB | SQLite + SQLAlchemy |
| Embedding | sentence-transformers (all-MiniLM-L6-v2) |
| LLM | Ollama (llama3.2) / Gemini API |
| PDF Parsing | pdfplumber |
- Python 3.10+
- Node.js 18+
- LLM Backend (λ μ€ νλ μ ν)
- Ollama (무λ£, λ‘컬 μ€ν) β κΆμ₯
- Gemini API Key (Google AI Studioμμ λ°κΈ)
https://ollama.com/download μ°Έκ³ νκ±°λ μλ λͺ λ Ήμ΄λ‘ μ€μΉ:
curl -fsSL https://ollama.com/install.sh | shλͺ¨λΈ λ€μ΄λ‘λ:
ollama pull llama3.2https://aistudio.google.com μμ API ν€ λ°κΈ ν .env μ μ
λ ₯.
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
cp ../.env.example .env
# .env νμΌμμ LLM_BACKEND, API ν€ λ± μ€μ
uvicorn main:app --reloadcd frontend
npm install
npm run devμ μ: http://localhost:3000 API λ¬Έμ: http://localhost:8000/docs
# LLM μ€μ
LLM_BACKEND=ollama # "ollama" λλ "gemini"
OLLAMA_MODEL=llama3.2 # Ollama μ¬μ© μ λͺ¨λΈλͺ
GEMINI_API_KEY=your_key # Gemini μ¬μ© μ API ν€
# ChromaDB
CHROMA_PERSIST_DIR=./chroma_db
ANONYMIZED_TELEMETRY=Falsepaperprobe/
βββ backend/
β βββ main.py
β βββ routers/
β β βββ upload.py # POST /api/upload, GET/DELETE /api/papers
β β βββ query.py # POST /api/query (streaming)
β β βββ compare.py # POST /api/compare (streaming, max 5)
β β βββ graph.py # POST /api/graph
β β βββ security.py # GET /api/security/{paper_id}
β βββ services/
β β βββ pdf_parser.py # ν
μ€νΈ μΆμΆ λ° μ²νΉ
β β βββ embedder.py # μλ² λ© μμ± λ° ChromaDB μ μ₯
β β βββ retriever.py # μ μ¬ μ²ν¬ κ²μ
β β βββ comparator.py # λ€μ€ λ
Όλ¬Έ λΉκ΅ ν둬ννΈ μμ±
β β βββ graph_builder.py # μ½μ¬μΈ μ μ¬λ κ³μ°
β β βββ injection_detector.py # μλ§¨ν± κΈ°λ° Prompt Injection νμ§
β βββ db/
β β βββ chroma.py
β β βββ sqlite.py
β βββ requirements.txt
βββ frontend/
β βββ app/
β β βββ page.tsx # λ©μΈ Q&A νμ΄μ§
β β βββ compare/page.tsx # λ
Όλ¬Έ λΉκ΅ νμ΄μ§
β β βββ graph/page.tsx # κ΄κ³λ κ·Έλν νμ΄μ§
β β βββ security/page.tsx # 보μ λΆμ νμ΄μ§
β βββ lib/
β βββ api.ts # API νΈμΆ ν¨μ λͺ¨μ
βββ .env.example
βββ README.md
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/upload |
PDF λ Όλ¬Έ μ λ‘λ |
| GET | /api/papers |
μ λ‘λλ λ Όλ¬Έ λͺ©λ‘ μ‘°ν |
| DELETE | /api/papers/{paper_id} |
λ Όλ¬Έ μμ |
| POST | /api/query |
λ¨μΌ λ Όλ¬Έ μ§μμλ΅ (μ€νΈλ¦¬λ°) |
| POST | /api/compare |
λ€μ€ λ Όλ¬Έ λΉκ΅ (μ€νΈλ¦¬λ°) |
| POST | /api/graph |
λ Όλ¬Έ μ μ¬λ κ·Έλν μμ± |
| GET | /api/security/{paper_id} |
Prompt Injection μ·¨μ½μ λΆμ |
λ³Έ νλ‘μ νΈλ RAG μμ€ν μ Prompt Injection μ·¨μ½μ μ μ°κ΅¬ λͺ©μ μΌλ‘ λΆμν©λλ€.
- νμ§: λ¨μ ν¨ν΄ λ§€μΉμ΄ μλ
all-MiniLM-L6-v2μλ² λ© κΈ°λ° μλ§¨ν± μ μ¬λλ‘ μ μ± μ§μλ¬Έ νμ§ β νκ΅μ΄/μμ΄ λͺ¨λ μ§μ - λ°©μ΄: λͺ¨λ RAG 쿼리μ μμ€ν λ°©μ΄ λ¬Έκ΅¬λ₯Ό μλ μ½μ νμ¬ λ Όλ¬Έ λ΄ μ μ± λͺ λ Ή μ€ν μ°¨λ¨
- μκ°ν: νμ§λ μμ¬ μ²ν¬λ₯Ό μνλ(high/medium/low)λ³λ‘ UIμμ νμΈ κ°λ₯
Becky | AI Security Research @ Hanyang University ERICA, ACE-LAB GitHub: paperprobe



