Contacts and chat RAG

A forensic chat analysis tool built on a Retrieval-Augmented Generation (RAG) pipeline. It ingests structured phone book and chat session data, embeds them into a vector store, and answers natural-language questions grounded in the evidence.

Architecture

Two-layer design:

Generic RAG (Services/, Interfaces/, Models/) -- reusable embedding, vector search, and streaming answer generation
Domain layer (Domain/) -- forensic-specific logic: participant resolution, aggregate queries, temporal filtering, pre-computed forensic facts

Design Patterns

Content-addressable caching -- SHA-256(filename + content) keys; skips OpenAI embedding calls when files haven't changed
HyDE (Hypothetical Document Embeddings) -- generates a draft answer before embedding for better semantic retrieval
Decorator pattern -- DomainRagOrchestrator wraps RagOrchestrator with domain enrichment
Adapter pattern -- OpenAiEmbeddingAdapter bridges OpenAI SDK to the generic IEmbeddingGenerator interface
Polly resilience -- exponential backoff with jitter for transient API failures (429, 5xx)
Atomic file writes -- cache writes go to .tmp then File.Move to prevent corruption

Prerequisites

.NET 10 SDK
OpenAI API key with access to chat and embedding models

Setup

Clone the repository
Create a .env file in the project root:
```
OPENAI_API_KEY=sk-...
```
Place forensic data files:
- phonebook.txt -- raw phone book export
- chats/ -- raw chat export directory

Usage

Interactive mode:

dotnet run

Commands in interactive mode:

Command	Description
`exit` / `quit`	Close the application
`exit with save`	Save conversation history as JSON and exit
`/clear`	Reset conversation context
`/help`	Show help
`Ctrl+C`	Cancel the current query

Running tests:

dotnet test                                          # all tests
dotnet test --filter "FullyQualifiedName~Unit"       # unit only
dotnet test --filter "FullyQualifiedName~Functional" # functional only
dotnet test --verbosity normal                       # verbose output

Validation test suite:

dotnet run -- --test

Runs 28 automated test cases covering:

Phone book queries (contacts, phone numbers, chat file IDs)
Temporal filtering (monthly activity, busiest day)
Aggregate queries (all contacts in a period, cross-file counts)
Comparison and exclusion queries
Miss attribution detection
Single-chat-max and timeline edge cases

Improvement TODO:

Linux
Page Index
GraphRAG
RAPTOR
MCP (drop interactive CLI)
C.R.U.D data for RAG data
Mutation tests
Line coverage minimum 95%, mutation coverage minimum 95%
Dockerfile

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Cli		Cli
Configuration		Configuration
Domain		Domain
Interfaces		Interfaces
Models		Models
Services		Services
chats		chats
tests		tests
.env.template		.env.template
.gitignore		.gitignore
ContactsRag.csproj		ContactsRag.csproj
ContactsRag.slnx		ContactsRag.slnx
Directory.Build.props		Directory.Build.props
Directory.Build.targets		Directory.Build.targets
LICENSE		LICENSE
Program.cs		Program.cs
SYSTEM_PROMPT		SYSTEM_PROMPT
ValidationTest.cs		ValidationTest.cs
phonebook.txt		phonebook.txt
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contacts and chat RAG

Architecture

Design Patterns

Prerequisites

Setup

Usage

Improvement TODO:

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Contacts and chat RAG

Architecture

Design Patterns

Prerequisites

Setup

Usage

Improvement TODO:

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages