Trilingual AI Shariah Chatbot for Islamic Finance
β¨ Ask questions in Arabic (Ψ§ΩΨΉΨ±Ψ¨ΩΨ©), English, or Bahasa Melayu
Powered by Ollama (Free & Local) or Claude Haiku (High Quality) with RAG from authoritative Shariah sources.
| Requirement | Version | Notes |
|---|---|---|
| Python | 3.11+ | Required |
| Ollama | Latest | For local LLM inference |
| Pinecone | Free tier | For vector database |
| Supabase | Free tier | For PostgreSQL + Storage |
# Clone the repository
git clone <repository-url>
cd for-ummah
# Create virtual environment (recommended)
python -m venv venv
source venv/bin/activate # On macOS/Linux
# OR
.\venv\Scripts\activate # On Windows# Install Python packages
pip install -r requirements.txtDependencies include:
fastapi,uvicorn- Backend APIstreamlit- Web UIpinecone- Vector databasepymupdf- PDF text extractionplaywright- Web scraping with WAF bypassrequests,beautifulsoup4- Web scraping
# macOS
brew install ollama
# Linux
curl -fsSL https://ollama.com/install.sh | sh
# Windows - Download from https://ollama.comPull required models:
# Start Ollama service
ollama serve
# In another terminal, pull models
ollama pull llama3.2 # Main LLM for chat
ollama pull nomic-embed-text # Embeddings for RAG# Copy example config
cp .env.example .env
# Edit .env with your Pinecone API keyRequired .env variables:
# Pinecone (required for vector DB)
PINECONE_API_KEY=your-pinecone-api-key
PINECONE_INDEX=shariah-kb
# Supabase (required for database + storage)
SUPABASE_URL=https://xxxxx.supabase.co
SUPABASE_KEY=your-supabase-key
# Optional Auth (Required only for Claude)
ANTHROPIC_API_KEY=sk-ant-api03-...
# Optional settings
DATA_DIR=data
LOG_LEVEL=INFO
# Optional RAG tuning
RAG_RELEVANCE_THRESHOLD=0.60 # Min relevance score (0.60-0.70)
RAG_RERANK_TOP_K=25 # After rerankingNote: Ollama runs 100% locally for free. An Anthropic API key is only needed if you want to use the Claude Haiku model.
Terminal 1 - Ollama (must be running):
ollama serveTerminal 2 - API Backend:
# Recommended: Use the helper script (handles flags for you)
./run_server.sh
# Or manually (MUST use --loop asyncio for scraper to work):
uvicorn src.api.main:app --reload --port 8000 --loop asyncioTerminal 3 - Streamlit UI:
streamlit run app.pyAccess:
- π Streamlit UI: http://localhost:8501
- π§ API Docs: http://localhost:8000/docs
for-ummah/
βββ app.py # Streamlit web UI
βββ requirements.txt # Python dependencies
βββ .env.example # Environment template
β
βββ src/
β βββ core/ # Configuration, language detection
β βββ db/ # Supabase integration
β β βββ client.py # Supabase client singleton
β β βββ models.py # Pydantic data models
β β βββ storage.py # Storage service (PDFs)
β β βββ repositories/ # Database repositories
β β βββ documents.py # Document CRUD
β β βββ chat.py # Chat sessions/messages
β β βββ ingestion.py # Ingestion history
β β βββ job_status.py # Background job status
β βββ scrapers/ # Web scrapers (BNM, SC Malaysia)
β βββ processors/ # PDF extraction, text chunking
β βββ vector_db/ # Pinecone + Ollama embeddings
β βββ ai/ # RAG pipeline, prompts, Ollama + Claude LLMs
β βββ services/ # Business logic
β β βββ chat.py # ChatService orchestrator
β β βββ history.py # Chat history (Supabase)
β β βββ ingestion.py # Document ingestion pipeline
β βββ api/ # FastAPI endpoints
β
βββ scripts/
β βββ reindex_with_pages.py # Re-process PDFs with page tracking
β βββ scrape_url.py # Download & index PDF from URL
β βββ translate_claude.py # (Optional) Batch translation tool
β
βββ docs/
β βββ architecture.md # System architecture diagrams
β
βββ data/ # Local cache (primary storage in Supabase)
| Component | Technology | Cost |
|---|---|---|
| LLM (Local) | Ollama llama3.2 | FREE (local) |
| LLM (Cloud) | Claude 3.5 Haiku | ~$0.001/query (High Quality) |
| Embeddings | Ollama nomic-embed-text | FREE (local) |
| Vector DB | Pinecone | Free tier |
| Database | Supabase PostgreSQL | Free tier |
| Storage | Supabase Storage | Free tier |
| Backend | FastAPI | - |
| Frontend | Streamlit | - |
| PDF Extraction | PyMuPDF β Tesseract OCR (cascade) | FREE |
| Reranking | CrossEncoder (ms-marco-MiniLM) | FREE |
- π Trilingual: Arabic (Ψ§ΩΨΉΨ±Ψ¨ΩΨ©), English, Bahasa Melayu
- π Authoritative Sources: BNM, AAOIFI, SC Malaysia, JAKIM
- π€ Hybrid AI: Choose between Ollama (Free) or Claude Haiku (Smart)
- π― High-Precision Reranking: CrossEncoder model boosts search relevance
- π Query Translation: Auto-translates Malay/Arabic queries to English for better search precision
- π Smart PDF: Page-level tracking with Arabic OCR support
- π Source Verification: Clickable citations with Exact Quote, Page Previews (Image), & Highlighted Text
- π¬ Chat History: Persistent conversation sessions stored in Supabase
- π€ Source Management: Upload PDFs or add by URL directly in UI
- π€ Automated Updates: Scheduled background scraper (APScheduler) checks for new BNM/SC documents
- π Admin Dashboard: Monitor document counts, storage, system health, and trigger manual updates
- βοΈ Cloud Storage: All documents stored in Supabase Storage with secure access
# Process all PDFs and index with page tracking
# Process all PDFs and index with page tracking
python scripts/reindex_with_pages.pyTip: You can also trigger an update from the Admin Dashboard without running scripts manually. Go to
/admin/trigger-updateor use the UI button.
# Download and index a PDF directly from URL
python scripts/scrape_url.py "https://example.com/document.pdf"
# With custom title and source
python scripts/scrape_url.py "URL" --title "Custom Title" --source BNMThis will:
- Extract text from PDFs with sentence-based chunking
- Preserve page numbers for source citations
- Upload to Pinecone with metadata
| Endpoint | Method | Description |
|---|---|---|
/chat |
POST | Main chat endpoint |
/health |
GET | Health check |
/docs |
GET | Swagger documentation |
/history/chats |
GET | List all chat sessions |
/history/chat/{id} |
GET | Get specific chat session |
/history/chat |
POST | Create/update chat session |
/history/sources |
GET | List all indexed sources |
/ingest/url |
POST | Ingest document from URL |
/ingest/upload |
POST | Upload and ingest PDF |
/pdf/{source}/{filename} |
GET | Serve PDF file |
/pdf/list |
GET | List available PDFs |
/admin/trigger-update |
POST | Trigger scraper update |
/admin/job-status |
GET | Get background job status |
Example API call:
curl -X POST "http://localhost:8000/chat" \
-H "Content-Type: application/json" \
-d '{
"question": "What is Murabaha?",
"language": "en",
"model": "claude"
}'# Check if Ollama is running
curl http://localhost:11434/api/tags
# Restart Ollama
ollama serve- Verify your API key in
.env - Check index name matches
PINECONE_INDEX - Ensure index exists in Pinecone dashboard
- Digital PDFs: Handled by PyMuPDF
- Scanned PDFs: Requires Tesseract OCR
# Install Tesseract (optional for scanned PDFs)
# macOS
brew install tesseract tesseract-lang
# Ubuntu
sudo apt install tesseract-ocr tesseract-ocr-araBuilt for the Ummah π
For questions or contributions, please open an issue on GitHub.