ContextAI — RAG Application with Spring AI, Ollama & PGVector

A production-ready Retrieval-Augmented Generation (RAG) application built with Spring AI 2.0, Ollama, and PGVector. Upload your documents (PDFs, text files), chunk and embed them into a vector store, and chat with an LLM that answers only from your uploaded content — no hallucinations.

This project is designed as a learning resource for developers exploring Spring AI and building RAG pipelines.

Architecture

┌─────────────────────────────────────────────────────────┐
│                    React Frontend                       │
│         (Vite + React Router + CSS Modules)             │
│                                                         │
│   ┌──────────┐   ┌──────────────┐   ┌───────────────┐   │
│   │ HomePage │   │  UploadPage  │   │   ChatPage    │   │
│   └──────────┘   └──────┬───────┘   └───────┬───────┘   │
└─────────────────────────────────────────────┼───────────┘
                          │  /api/upload      │  /api/chat
                          ▼                   ▼
┌─────────────────────────────────────────────────────────┐
│                 Spring Boot Backend                     │
│                                                         │
│   ┌────────────────────┐     ┌──────────────────────┐   │
│   │FileUploadController│     │   ChatController     │   │
│   │  POST /api/upload  │     │   POST /api/chat     │   │
│   │  GET  /api/status  │     │                      │   │
│   └─────────┬──────────┘     └──────────┬───────────┘   │
│             │                           │               │
│   ┌─────────▼──────────┐     ┌──────────▼───────────┐   │
│   │DataIngestionService│     │  QuestionAnswer      │   │
│   │AsyncIngestion      │     │  Advisor             │   │
│   │Processor (@Async)  │     │  (similarity search) │   │
│   └─────────┬──────────┘     └──────────┬───────────┘   │
│             │                           │               │
│   ┌─────────▼───────────────────────────▼───────────┐   │
│   │              PGVector (Vector Store)            │   │
│   │         Embeddings via Ollama nomic-embed-text  │   │
│   └─────────────────────────────────────────────────┘   │
│                           │                             │
│                  ┌────────▼────────┐                    │
│                  │  Ollama LLM     │                    │
│                  │  llama3.2:1b    │                    │
│                  └─────────────────┘                    │
└─────────────────────────────────────────────────────────┘

Tech Stack

Layer	Technology	Purpose
Backend	Spring Boot 4.0 + Spring AI 2.0	REST API, AI orchestration
LLM	Ollama (any model)	Local inference, no API keys needed. Default: `llama3.2:1b`
Embeddings	Ollama (nomic-embed-text)	Document embedding for similarity search
Vector Store	PGVector (PostgreSQL extension)	Stores and queries document embeddings
Document Parsing	Apache Tika	Extracts text from PDFs, DOCX, TXT, etc.
Frontend	React 19 + Vite + React Router	SPA with modular dark-themed UI
Infrastructure	Docker Compose (Spring Boot managed)	Auto-started by Spring Boot on app launch

Spring AI Concepts Covered

This project demonstrates several key Spring AI features:

1. Chat Client with System Prompt

// ChatController.java
this.chatClient = ChatClient.builder(ollamaChatModel)
        .defaultSystem(SYSTEM_PROMPT)
        .defaultAdvisors(/* ... */)
        .build();

The ChatClient is Spring AI's high-level abstraction for interacting with LLMs. You configure it once with a system prompt and advisors, then call .prompt().user(message).call().content() for each request.

2. QuestionAnswerAdvisor (RAG)

QuestionAnswerAdvisor.builder(vectorStore)
        .searchRequest(SearchRequest.builder()
                .topK(3)
                .similarityThreshold(0.7)
                .build())
        .build()

This is the core of the RAG pipeline. The QuestionAnswerAdvisor automatically:

Takes the user's question
Performs a similarity search against the vector store
Injects the retrieved document chunks as CONTEXT into the prompt
Sends the augmented prompt to the LLM

topK(3) — Retrieve top 3 most similar chunks (tradeoff: more chunks = more context but slower inference).

similarityThreshold(0.7) — Only include chunks with ≥70% similarity (filters out irrelevant noise).

3. Document Ingestion Pipeline

MultipartFile → Apache Tika → TikaDocumentReader → TextSplitter → VectorStore

TikaDocumentReader — Spring AI's integration with Apache Tika. Reads any supported file format (PDF, DOCX, TXT, HTML) and produces Document objects.
TokenTextSplitter — Splits documents into chunks by token count, respecting sentence boundaries.
VectorStore.accept() — Embeds chunks using the configured embedding model and stores them in PGVector.

4. Configurable Chunking Strategy

// ChunkingConfig.java
@Bean
public TextSplitter textSplitter() {
    return new TokenTextSplitter(
        chunkSize,              // 300 tokens per chunk
        minChunkSizeChars,      // minimum 100 characters
        minChunkLengthToEmbed,  // skip chunks < 50 chars
        maxNumChunks,           // cap at 5000 chunks per document
        keepSeparator,          // preserve sentence boundaries
        List.of('.', '!', '?', '\n')  // split on punctuation
    );
}

Chunking is critical for RAG quality. Too large = irrelevant context; too small = lost meaning. The values are externalized to application.properties so you can tune without recompiling.

5. PGVector Auto-Configuration

spring.ai.vectorstore.pgvector.initialize-schema=true

Spring AI auto-creates the vector_store table in PostgreSQL with the pgvector extension. No manual SQL needed.

6. Ollama Integration

spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.options.model=llama3.2:1b
spring.ai.ollama.chat.options.num-ctx=2048
spring.ai.ollama.chat.options.temperature=0.1
spring.ai.ollama.init.pull-model-strategy=when_missing

model — Any model available on Ollama's model library works. Just change the value:
- llama3.2:1b — Lightweight, fast on CPU (~4s responses)
- llama3.2:3b — Better quality, still runs on most machines
- llama3.1:8b — High quality, needs ~8GB RAM
- mistral:7b — Strong general-purpose alternative
- gemma2:9b — Google's model, good at instruction following
- phi3:mini — Microsoft's compact model
pull-model-strategy=when_missing — Automatically downloads the chosen model on first run.
num-ctx=2048 — Context window size (tokens). Larger = can process more context but slower.
temperature=0.1 — Low temperature for factual, grounded answers (less creative, more accurate).

7. Async Processing with @Async

File ingestion can take minutes for large documents. The upload endpoint returns a job ID immediately while processing continues in the background:

// AsyncIngestionProcessor.java — separate @Component bean
@Async
public void process(String jobId, byte[][] fileBytes, String[] fileNames, Map<String, JobStatus> jobs) {
    // parse, chunk, embed — runs on a separate thread
}

Important: Spring's @Async uses proxy-based AOP. Calling an @Async method from within the same class bypasses the proxy and runs synchronously. That's why the async logic is in a separate @Component bean.

8. Spring Boot Docker Compose Integration

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-docker-compose</artifactId>
</dependency>

This is one of the most powerful features in this project. You don't need to run docker compose up manually. When you start the Spring Boot application with ./mvnw spring-boot:run, Spring Boot:

Detects compose.yaml in the project root
Automatically runs docker compose up to start Ollama and PGVector
Reads the container connection details (ports, credentials)
Auto-configures the datasource, vector store, and Ollama base URL

When the application shuts down, it also stops the Docker containers. This means the entire infrastructure lifecycle is managed by Spring Boot — zero manual Docker commands needed for development.

Project Structure

├── compose.yaml                          # Auto-started by Spring Boot on app launch
├── pom.xml                               # Maven dependencies (Spring AI 2.0, Tika, PGVector)
├── src/main/java/ai/assistant/bot/
│   ├── BotApplication.java               # Spring Boot entry point
│   ├── config/
│   │   ├── AsyncConfig.java              # Enables @Async support
│   │   └── ChunkingConfig.java           # TokenTextSplitter bean configuration
│   ├── controller/
│   │   ├── ChatController.java           # POST /api/chat — RAG chat endpoint
│   │   └── FileUploadController.java     # POST /api/upload — async file ingestion
│   ├── model/
│   │   └── JobStatus.java                # Java record for ingestion job tracking
│   └── service/
│       ├── DataIngestionService.java      # Interface
│       ├── DataIngestionServiceImpl.java  # Orchestrates upload + async handoff
│       └── AsyncIngestionProcessor.java   # @Async document processing
├── src/main/resources/
│   └── application.properties            # All configuration (Ollama, PGVector, chunking)
└── ContextAI/                            # React frontend
    ├── Dockerfile                        # Multi-stage build (Node → Nginx)
    ├── nginx.conf                        # SPA routing + API proxy
    ├── src/
    │   ├── App.jsx                       # State management + routing
    │   ├── components/
    │   │   ├── navbar/                   # Navigation bar
    │   │   ├── hero/                     # Homepage hero panel
    │   │   ├── upload/                   # File upload widget
    │   │   ├── status/                   # Ingestion job status cards
    │   │   └── chat/                     # Chat interface
    │   └── pages/
    │       ├── HomePage.jsx              # Landing page with app description
    │       ├── UploadPage.jsx            # Document upload + status tracking
    │       └── ChatPage.jsx              # Chat with your documents
    └── vite.config.js                    # Dev server proxy to backend

Getting Started

Prerequisites

Java 25+ (Amazon Corretto or any JDK)
Docker & Docker Compose (for Ollama and PGVector)
Node.js 20+ (for frontend development)
Maven (or use the included mvnw wrapper)

1. Clone the repository

git clone https://github.com/Siddharthpratapsingh/ContextAI-SpringAI.git
cd ContextAI-SpringAI

2. Start the backend

./mvnw spring-boot:run

This automatically:

Detects compose.yaml and runs docker compose up (Ollama + PGVector)
Auto-configures datasource and Ollama connections from the running containers
Downloads llama3.2:1b model if missing
Creates the vector store schema in PostgreSQL
Starts the API server on http://localhost:8080

Note: You don't need to run docker compose up separately. Spring Boot manages the entire Docker lifecycle.

3. Start the frontend (development)

cd ContextAI
npm install
npm run dev

Frontend runs at http://localhost:5173 with API calls proxied to the backend.

4. Or build the frontend Docker image separately

docker compose up --build frontend

Frontend available at http://localhost:3000. Note that the backend already manages Ollama and PGVector containers via Spring Boot Docker Compose integration — this command is only needed if you want to run the frontend in a container instead of using npm run dev.

API Endpoints

Method	Endpoint	Description
`POST`	`/api/upload`	Upload files (multipart). Returns job ID immediately.
`GET`	`/api/ingestion/status/{jobId}`	Check ingestion job status (PROCESSING / COMPLETED / FAILED).
`DELETE`	`/api/ingestion/status/{jobId}`	Remove a completed job from tracking.
`POST`	`/api/chat`	Send a question (plain text body). Returns RAG-grounded answer.

Example: Upload a document

curl -X POST http://localhost:8080/api/upload \
  -F "file=@my-document.pdf"

Response:

{"jobId": "a1b2c3d4-...", "status": "PROCESSING", "message": "Ingestion in progress"}

Example: Chat with your documents

curl -X POST http://localhost:8080/api/chat \
  -H "Content-Type: text/plain" \
  -d "What are the key points in the uploaded document?"

Configuration Reference

All configuration lives in src/main/resources/application.properties:

# Ollama — swap the model to any from https://ollama.com/library
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.options.model=llama3.2:1b    # Try: llama3.1:8b, mistral:7b, gemma2:9b
spring.ai.ollama.chat.options.num-ctx=2048         # Context window (tokens)
spring.ai.ollama.chat.options.temperature=0.1      # Lower = more factual

# Vector Store
spring.ai.vectorstore.pgvector.initialize-schema=true

# Chunking (tune these for your documents)
rag.chunking.chunk-size=300                        # Tokens per chunk
rag.chunking.min-chunk-size-chars=100              # Minimum characters
rag.chunking.min-chunk-length-to-embed=50          # Skip tiny chunks
rag.chunking.max-num-chunks=5000                   # Max chunks per document

# File Upload
spring.servlet.multipart.max-file-size=50MB
spring.servlet.multipart.max-request-size=50MB

Performance Tuning

Parameter	Effect	Tradeoff
`topK` (SearchRequest)	Number of chunks retrieved	More = richer context but slower LLM inference
`similarityThreshold`	Minimum relevance score (0.0–1.0)	Higher = more precise but may miss relevant chunks
`chunk-size`	Tokens per chunk	Larger = more context per chunk but less precise retrieval
`num-ctx`	LLM context window	Larger = can process more chunks but uses more memory/time
`temperature`	LLM creativity	Lower = more factual, higher = more creative

Key Learnings

Spring AI makes RAG simple — QuestionAnswerAdvisor handles the entire retrieve-augment-generate pipeline in one line.
Ollama runs locally — No API keys, no cloud costs, full privacy. Swap models by changing one property (spring.ai.ollama.chat.options.model). Browse available models at ollama.com/library.
@Async needs separate beans — Spring's proxy-based AOP doesn't intercept self-invocations. Always put @Async methods in a different @Component.
Chunk size matters — Too large and retrieval returns irrelevant context; too small and you lose semantic meaning.
Small models need simple prompts — llama3.2:1b can't follow complex multi-rule system prompts. Keep instructions short and direct.
Docker Compose integration — Spring Boot auto-detects compose.yaml, starts the containers on app launch, auto-configures connections, and stops them on shutdown. No manual docker compose up needed.

License

MIT

Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.mvn/wrapper		.mvn/wrapper
ContextAI		ContextAI
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
compose.yaml		compose.yaml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ContextAI — RAG Application with Spring AI, Ollama & PGVector

Architecture

Tech Stack

Spring AI Concepts Covered

1. Chat Client with System Prompt

2. QuestionAnswerAdvisor (RAG)

3. Document Ingestion Pipeline

4. Configurable Chunking Strategy

5. PGVector Auto-Configuration

6. Ollama Integration

7. Async Processing with @Async

8. Spring Boot Docker Compose Integration

Project Structure

Getting Started

Prerequisites

1. Clone the repository

2. Start the backend

3. Start the frontend (development)

4. Or build the frontend Docker image separately

API Endpoints

Example: Upload a document

Example: Chat with your documents

Configuration Reference

Performance Tuning

Key Learnings

License

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Siddharthpratapsingh/ContextAI-SpringAI

Folders and files

Latest commit

History

Repository files navigation

ContextAI — RAG Application with Spring AI, Ollama & PGVector

Architecture

Tech Stack

Spring AI Concepts Covered

1. Chat Client with System Prompt

2. QuestionAnswerAdvisor (RAG)

3. Document Ingestion Pipeline

4. Configurable Chunking Strategy

5. PGVector Auto-Configuration

6. Ollama Integration

7. Async Processing with @Async

8. Spring Boot Docker Compose Integration

Project Structure

Getting Started

Prerequisites

1. Clone the repository

2. Start the backend

3. Start the frontend (development)

4. Or build the frontend Docker image separately

API Endpoints

Example: Upload a document

Example: Chat with your documents

Configuration Reference

Performance Tuning

Key Learnings

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages