A real-time, AI-driven FAQ resolution system that automatically answers user questions, deduplicates semantically similar queries, and keeps a living FAQ knowledge base in sync β powered by local LLM inference.
User asks a question (text or voice)
β
βββββββββββββββββββββββββββββββββββββββ
β 1. Voice β Web Speech API transcribes β
β 2. Embed question (nomic-embed-text) β
β 3. Cosine similarity against all FAQs β
βββββββββββββββββββββββββββββββββββββββ
β β
Similarity β₯ 0.82 No close match
β β
Return existing answer β LLM generates formal answer
(FAQ reused, no dup) (qwen2.5:latest on Metal GPU)
β
FAQ auto-categorized (8 categories)
β
Saved to MongoDB with embedding
β
Socket.io β browser updates live
β
Purple "New" badge + toast notification
| Feature | Description |
|---|---|
| Local LLM | qwen2.5:latest (7.6B Q4_K_M) via Ollama β no API costs, fully offline |
| Semantic Search | Cosine similarity against all FAQ embeddings β catches paraphrases |
| Deduplication | Threshold 0.82 β no duplicate FAQs for semantically identical questions |
| Auto-categorization | LLM classifies into 8 categories: AI/ML, Programming, Finance, Education, Healthcare, Cloud/DevOps, Design, General |
| Voice Input | Browser Web Speech API (Chrome/Edge) β real-time transcription, no server round-trip |
| Live Grid Updates | New AI FAQs appear in the FAQ browser instantly via Socket.io |
| Toast Notifications | Purple AI-themed toast every time a new FAQ is auto-created |
| AI Badges | π€ on existing AI FAQs, β¨ "New" sparkle badge for 30s after creation |
| Formal FAQ Answers | LLM system prompt enforces professional, concise FAQ style |
| Full History | Chat threads stored in MongoDB with user attribution |
client/ server/
src/
pages/
ChatBot.jsx β AI FAQ assistant (text + voice)
FAQBrowser.jsx β Live-updating FAQ grid
components/
FAQCard.jsx β Card with AI/New badges
Layout.jsx
context/
ToastContext.jsx β Global toast notifications
SocketContext.jsxβ Socket.io event bus
App.jsx
server/
services/
ollama.js β LLM chat, embeddings, category detection
aiResolver.js β Similarity check, FAQ upsert, broadcast
routes/
chat.js β POST /api/chat (text + voice)
models/
FAQ.js β question, answer, category, embedding, isAI
Activity.js β faq_created, ai_response, ai_reuse events
Ollama models used:
qwen2.5:latestβ FAQ answer generation + category detectionnomic-embed-textβ 768-dim question embeddings
- Node.js 18+
- MongoDB (local or Atlas)
- Ollama v0.24+
ollama pull qwen2.5:latest # 7.6B Q4_K_M, ~4.7GB β runs on M3 Metal GPU
ollama pull nomic-embed-text # 768-dim embeddings, 274MB
ollama serveMacBook Air M3:
qwen2.5:latestuses ~4.7GB and runs on the Metal GPU. If you have limited RAM, fall back toqwen2.5:3b.
# server/.env
MONGO_URI=mongodb://localhost:27017/crowd
JWT_SECRET=your-secret-here
PORT=5001# client/.env
VITE_API_URL=http://localhost:5001cd server
npm install
node index.js
# β http://localhost:5001cd client
npm install
npx vite --host 0.0.0.0
# β http://localhost:5173# server/seeds/data.js contains demo users and categories
node seeds/seed.js # or run via MongoDB Compass / mongoshDemo credentials:
admin@crowd.faq/password123demo@crowd.faq/password123
POST /api/chat
Authorization: Bearer <token>
Content-Type: application/json
// Text question
{ "message": "How do I reset my password?" }
// Voice message (base64 WebM audio)
// Voice is transcribed via browser Web Speech API before sending
{ "voiceData": "<base64 audio>", "spokenText": "How do I reset my password?" }
Response:
{
"reply": "To reset your password, navigate to the login page...",
"faqId": "6789abc...",
"source": "existing | generated",
"category": "General",
"similarity": 0.87,
"isNew": false,
"question": "How do I reset my password?"
}| Similarity | Outcome |
|---|---|
| β₯ 0.82 | Return existing FAQ answer (no new entry created) |
| < 0.82 | LLM generates new answer, saves FAQ with embedding |
The server emits on the activity channel:
{
"type": "ai_faq_created",
"faq": {
"_id": "...",
"question": "How does 2FA improve security?",
"answer": "Two-factor authentication...",
"category": "Cloud/DevOps"
},
"createdAt": "2026-05-29T..."
}The FAQ Browser listens for these and prepends new cards to the grid without refresh.
| File | Purpose |
|---|---|
server/services/ollama.js |
LLM chat, embeddings, category detection, audio transcription |
server/services/aiResolver.js |
Core pipeline β similarity check β generate/reuse β broadcast |
server/routes/chat.js |
REST endpoint β handles text + voice, returns structured response |
client/src/pages/ChatBot.jsx |
UI β voice recording, Web Speech API, markdown rendering, source badge |
client/src/pages/FAQBrowser.jsx |
UI β live grid updates via Socket.io, filter/search, pagination |
client/src/components/FAQCard.jsx |
UI β AI badge, New sparkle badge with purple glow animation |
client/src/context/ToastContext.jsx |
Global toast notification system |
To use a different Ollama model, edit server/services/ollama.js:
const LLM_MODEL = 'qwen2.5:14b' // larger, slower, more accurate
// const LLM_MODEL = 'qwen2.5:3b' // smaller, faster, less accurateRestart the server after changing. The nomic-embed-text embedding model is fixed β it is the standard Ollama embedding choice.
# Frontend
cloudflared tunnel --url http://localhost:5173
# Backend
cloudflared tunnel --url http://localhost:5001Set VITE_API_URL in client/.env to your deployed backend URL and run:
cd client && npx vite build
# serve dist/ with nginx / Vercel / NetlifySet origin in server/index.js CORS config to your frontend domain.
- Frontend: React 18, Vite, React Router, Tailwind CSS, Lucide icons, Socket.io-client
- Backend: Node.js, Express, Mongoose, Socket.io,
form-data - LLM: Ollama (
qwen2.5:latest,nomic-embed-text) β runs locally via Metal GPU (M3) - Database: MongoDB
- Voice: Browser Web Speech API (no external service)
- Auth: JWT (jsonwebtoken)