A forensic chat analysis tool built on a Retrieval-Augmented Generation (RAG) pipeline. It ingests structured phone book and chat session data, embeds them into a vector store, and answers natural-language questions grounded in the evidence.
Two-layer design:
- Generic RAG (
Services/,Interfaces/,Models/) -- reusable embedding, vector search, and streaming answer generation - Domain layer (
Domain/) -- forensic-specific logic: participant resolution, aggregate queries, temporal filtering, pre-computed forensic facts
- Content-addressable caching -- SHA-256(filename + content) keys; skips OpenAI embedding calls when files haven't changed
- HyDE (Hypothetical Document Embeddings) -- generates a draft answer before embedding for better semantic retrieval
- Decorator pattern --
DomainRagOrchestratorwrapsRagOrchestratorwith domain enrichment - Adapter pattern --
OpenAiEmbeddingAdapterbridges OpenAI SDK to the genericIEmbeddingGeneratorinterface - Polly resilience -- exponential backoff with jitter for transient API failures (429, 5xx)
- Atomic file writes -- cache writes go to
.tmpthenFile.Moveto prevent corruption
- .NET 10 SDK
- OpenAI API key with access to chat and embedding models
- Clone the repository
- Create a
.envfile in the project root:OPENAI_API_KEY=sk-... - Place forensic data files:
phonebook.txt-- raw phone book exportchats/-- raw chat export directory
Interactive mode:
dotnet runCommands in interactive mode:
| Command | Description |
|---|---|
exit / quit |
Close the application |
exit with save |
Save conversation history as JSON and exit |
/clear |
Reset conversation context |
/help |
Show help |
Ctrl+C |
Cancel the current query |
Running tests:
dotnet test # all tests
dotnet test --filter "FullyQualifiedName~Unit" # unit only
dotnet test --filter "FullyQualifiedName~Functional" # functional only
dotnet test --verbosity normal # verbose outputValidation test suite:
dotnet run -- --testRuns 28 automated test cases covering:
- Phone book queries (contacts, phone numbers, chat file IDs)
- Temporal filtering (monthly activity, busiest day)
- Aggregate queries (all contacts in a period, cross-file counts)
- Comparison and exclusion queries
- Miss attribution detection
- Single-chat-max and timeline edge cases
- Linux
- Page Index
- GraphRAG
- RAPTOR
- MCP (drop interactive CLI)
- C.R.U.D data for RAG data
- Mutation tests
- Line coverage minimum 95%, mutation coverage minimum 95%
- Dockerfile