Retrieval-Augmented Generation pipeline using React, Node.js, LangChain, Qdrant, and Google Gemini. Enhances prompts, retrieves semantic chunks, and generates grounded answers.
- Backend repo - https://github.com/zanzo2003/DocuBot_backend
- Frontend repo - https://github.com/zanzo2003/DocuBot_frontend
- Advanced RAG pipeline: prompt enhancement → retrieval → grounded generation
- Prompt Enhancer: rewrite & normalize queries to improve retrieval recall
- Semantic retrieval with Qdrant (top-k chunk search)
- PDF ingestion: chunking + embeddings using LangChain loaders
- Modular backend: services, controllers, routes, utils, db, middleware for easy extension
- Ingest: PDF → chunk → embed → store in Qdrant
- Enhance: user query rewritten (LLM) to increase retrieval quality
- Retrieve: semantic search (top-k) returning relevant chunks
- Answer: LLM generates response grounded on retrieved context
Frontend: React
Backend: Node.js, Express
AI/ML: LangChain, Google Generative AI (Gemini), GoogleGenerativeAIEmbeddings
Vector Database: Qdrant
DocuBot/
├─ frontend/
├─ backend/
├─ screenshots/
└─ README.md
-
Clone the repository
git clone https://github.com/your-username/DocuBot.git cd DocuBot -
Setup Backend
cd backend npm install -
Environment Configuration
Create a
.envfile in the backend directory:PORT=8080 QDRANT_URL=http://localhost:6333 QDRANT_COLLECTION=docubot_collection GOOGLE_API_KEY=your_google_api_key GENAI_BASE_URI=https://generativelanguage.googleapis.com/v1beta/openai FRONTEND_URL=http://localhost:3000 NODE_ENV=development
-
Start Backend
npm run dev
-
Setup Frontend
cd ../frontend npm install npm start
curl -X POST http://localhost:8080/api/files/upload \
-F "file=@/path/to/file.pdf"curl -X POST http://localhost:8080/api/chat \
-H "Content-Type: application/json" \
-d '{
"query": "Summarize section 2",
"fileName": "file.pdf"
}'curl -X DELETE http://localhost:8080/api/files/file.pdfAdd your screenshots to the ./screenshots/ directory:
- Support DOCX / MD ingestion and automatic OCR
- Streaming LLM responses + token-level highlights
- Docker + k8s deployment & infra templates
- Per-user sessions, multi-tenant support, and auth
Suggested branches:
main(stable)dev(work-in-progress)
Built with LangChain • Qdrant • Gemini


