The Historically Bounded AI Assistant is an offline Retrieval-Augmented Generation (RAG) system designed to answer questions using only knowledge available before December 31, 1899.
Unlike modern AI models that have access to the internet and 21st-century data, this system is strictly engineered to simulating an intelligent agent existing in the late Victorian era (c. 1890). It features strict temporal guardrails that reject any query related to modern technology, ensuring a historically immersive and safe experience.
Key Definition: This is an "Offline-First" AI system suitable for secure environments, academic demonstrations, and resume portfolios.
The system follows a modular RAG architecture:
-
Era-Specific Guardrails:
- Concept: A filtering layer that intercepts user queries before they reach the brain of the AI.
- Logic: Regex-based keyword spotting blocks terms like "Internet", "Nuclear", "iPhone", and post-1900 dates.
- Outcome: Zero-shot refusal of anachronistic queries.
-
Retrieval-Augmented Generation (RAG):
- Vector Database (FAISS): Stores mathematical representations (embeddings) of historical texts (e.g., On the Origin of Species, Gettysburg Address).
- Semantic Search: Finds the most relevant 19th-century text chunks matching the user's query.
- Context Assembly: Injects these historical facts into the prompt given to the Language Model.
-
Offline Inference Engine:
- Mock Mode (Default Demo): A lightweight heuristic engine for standard queries (Newton, Darwin, etc.) generally used for quick demos without downloading 10GB+ models.
- Full ML Mode: Supports local execution of open-source LLMs (like Llama-2 or Mistral) via HuggingFace Transformers for generating novel text based on retrieved context.
| Component | Tool / Library | Purpose |
|---|---|---|
| Frontend | Streamlit | Creates the responsive, parchment-themed web UI. |
| Language | Python 3.9+ | Core logic and orchestration. |
| Vector DB | FAISS (CPU) | Fast, local similarity search for document chunks. |
| Embeddings | Sentence-Transformers | Converts text to vectors (Optional/Mock supported). |
| Validation | Regex | Strict pattern matching for guardrails. |
The system is optimized for topics well-documented and understood in the 19th Century:
- Physics: "Who is Isaac Newton?", "What is light?", "Explain electricity."
- Biology: "What is natural selection?", "Explain Darwin's theory."
- History: "Who is Abraham Lincoln?", "What happened at Gettysburg?"
- General Knowledge: "What isn't an electron?" (Includes 1897 discovery context).
The system will strictly refuse to answer queries containing anachronisms:
- Modern Tech: "What is an iPhone?", "How does WiFi work?"
- Future Events: "Who won the World War?", "What is the United Nations?"
- Computing: "Write a Python script", "What is AI?"
Clone the repository and install dependencies:
pip install -r requirements.txtIngest the raw historical data into the vector database:
python app/build_knowledge_base.pyLaunch the web UI:
streamlit run app/ui.py
# Open http://localhost:8501historical-ai/
├── app/
│ ├── main.py # Core backend logic & CLI entry point
│ ├── ui.py # Streamlit frontend code (CSS & Layout)
│ ├── guardrails.py # Safety filters (keywords & dates)
│ ├── retriever.py # FAISS vector database wrapper
│ ├── llm.py # Offline Model wrapper (Mock & Real modes)
│ └── build_knowledge_base.py # Data ingestion pipeline
├── data/
│ ├── raw/ # Original text files (Darwin, Lincoln)
│ └── chunks/ # Processed JSON chunks
├── vector_db/ # Stored FAISS index files
├── evaluation/ # Test suites for guardrails
└── requirements.txt # Python dependencies
- Mock Mode (Current Default):
- Runs instantly without heavy downloads.
- Uses a "Dictionary of Knowledge" to answer specific demo questions (Newton, Darwin, etc.).
- Provides a generic "Consulting Archives..." response for unknown valid queries.
- Real Mode:
- Requires downloading a
.ggufor HuggingFace model. - Generates organic, word-by-word answers from the LLM.
- To enable: Install
torchand run with--model_pathargument.
- Requires downloading a
"The past is a foreign country; they do things differently there." — L.P. Hartley (1953) [Note: Quote falls outside knowledge cutoff, system would refuse to cite this.]