Skip to content

RAG based AI application created using Springboot and SpringAI

Notifications You must be signed in to change notification settings

Siddharthpratapsingh/ContextAI-SpringAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ContextAI — RAG Application with Spring AI, Ollama & PGVector

A production-ready Retrieval-Augmented Generation (RAG) application built with Spring AI 2.0, Ollama, and PGVector. Upload your documents (PDFs, text files), chunk and embed them into a vector store, and chat with an LLM that answers only from your uploaded content — no hallucinations.

This project is designed as a learning resource for developers exploring Spring AI and building RAG pipelines.


Architecture

┌─────────────────────────────────────────────────────────┐
│                    React Frontend                       │
│         (Vite + React Router + CSS Modules)             │
│                                                         │
│   ┌──────────┐   ┌──────────────┐   ┌───────────────┐   │
│   │ HomePage │   │  UploadPage  │   │   ChatPage    │   │
│   └──────────┘   └──────┬───────┘   └───────┬───────┘   │
└─────────────────────────────────────────────┼───────────┘
                          │  /api/upload      │  /api/chat
                          ▼                   ▼
┌─────────────────────────────────────────────────────────┐
│                 Spring Boot Backend                     │
│                                                         │
│   ┌────────────────────┐     ┌──────────────────────┐   │
│   │FileUploadController│     │   ChatController     │   │
│   │  POST /api/upload  │     │   POST /api/chat     │   │
│   │  GET  /api/status  │     │                      │   │
│   └─────────┬──────────┘     └──────────┬───────────┘   │
│             │                           │               │
│   ┌─────────▼──────────┐     ┌──────────▼───────────┐   │
│   │DataIngestionService│     │  QuestionAnswer      │   │
│   │AsyncIngestion      │     │  Advisor             │   │
│   │Processor (@Async)  │     │  (similarity search) │   │
│   └─────────┬──────────┘     └──────────┬───────────┘   │
│             │                           │               │
│   ┌─────────▼───────────────────────────▼───────────┐   │
│   │              PGVector (Vector Store)            │   │
│   │         Embeddings via Ollama nomic-embed-text  │   │
│   └─────────────────────────────────────────────────┘   │
│                           │                             │
│                  ┌────────▼────────┐                    │
│                  │  Ollama LLM     │                    │
│                  │  llama3.2:1b    │                    │
│                  └─────────────────┘                    │
└─────────────────────────────────────────────────────────┘

Tech Stack

Layer Technology Purpose
Backend Spring Boot 4.0 + Spring AI 2.0 REST API, AI orchestration
LLM Ollama (any model) Local inference, no API keys needed. Default: llama3.2:1b
Embeddings Ollama (nomic-embed-text) Document embedding for similarity search
Vector Store PGVector (PostgreSQL extension) Stores and queries document embeddings
Document Parsing Apache Tika Extracts text from PDFs, DOCX, TXT, etc.
Frontend React 19 + Vite + React Router SPA with modular dark-themed UI
Infrastructure Docker Compose (Spring Boot managed) Auto-started by Spring Boot on app launch

Spring AI Concepts Covered

This project demonstrates several key Spring AI features:

1. Chat Client with System Prompt

// ChatController.java
this.chatClient = ChatClient.builder(ollamaChatModel)
        .defaultSystem(SYSTEM_PROMPT)
        .defaultAdvisors(/* ... */)
        .build();

The ChatClient is Spring AI's high-level abstraction for interacting with LLMs. You configure it once with a system prompt and advisors, then call .prompt().user(message).call().content() for each request.

2. QuestionAnswerAdvisor (RAG)

QuestionAnswerAdvisor.builder(vectorStore)
        .searchRequest(SearchRequest.builder()
                .topK(3)
                .similarityThreshold(0.7)
                .build())
        .build()

This is the core of the RAG pipeline. The QuestionAnswerAdvisor automatically:

  1. Takes the user's question
  2. Performs a similarity search against the vector store
  3. Injects the retrieved document chunks as CONTEXT into the prompt
  4. Sends the augmented prompt to the LLM

topK(3) — Retrieve top 3 most similar chunks (tradeoff: more chunks = more context but slower inference).

similarityThreshold(0.7) — Only include chunks with ≥70% similarity (filters out irrelevant noise).

3. Document Ingestion Pipeline

MultipartFile → Apache Tika → TikaDocumentReader → TextSplitter → VectorStore
  • TikaDocumentReader — Spring AI's integration with Apache Tika. Reads any supported file format (PDF, DOCX, TXT, HTML) and produces Document objects.
  • TokenTextSplitter — Splits documents into chunks by token count, respecting sentence boundaries.
  • VectorStore.accept() — Embeds chunks using the configured embedding model and stores them in PGVector.

4. Configurable Chunking Strategy

// ChunkingConfig.java
@Bean
public TextSplitter textSplitter() {
    return new TokenTextSplitter(
        chunkSize,              // 300 tokens per chunk
        minChunkSizeChars,      // minimum 100 characters
        minChunkLengthToEmbed,  // skip chunks < 50 chars
        maxNumChunks,           // cap at 5000 chunks per document
        keepSeparator,          // preserve sentence boundaries
        List.of('.', '!', '?', '\n')  // split on punctuation
    );
}

Chunking is critical for RAG quality. Too large = irrelevant context; too small = lost meaning. The values are externalized to application.properties so you can tune without recompiling.

5. PGVector Auto-Configuration

spring.ai.vectorstore.pgvector.initialize-schema=true

Spring AI auto-creates the vector_store table in PostgreSQL with the pgvector extension. No manual SQL needed.

6. Ollama Integration

spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.options.model=llama3.2:1b
spring.ai.ollama.chat.options.num-ctx=2048
spring.ai.ollama.chat.options.temperature=0.1
spring.ai.ollama.init.pull-model-strategy=when_missing
  • model — Any model available on Ollama's model library works. Just change the value:
    • llama3.2:1b — Lightweight, fast on CPU (~4s responses)
    • llama3.2:3b — Better quality, still runs on most machines
    • llama3.1:8b — High quality, needs ~8GB RAM
    • mistral:7b — Strong general-purpose alternative
    • gemma2:9b — Google's model, good at instruction following
    • phi3:mini — Microsoft's compact model
  • pull-model-strategy=when_missing — Automatically downloads the chosen model on first run.
  • num-ctx=2048 — Context window size (tokens). Larger = can process more context but slower.
  • temperature=0.1 — Low temperature for factual, grounded answers (less creative, more accurate).

7. Async Processing with @Async

File ingestion can take minutes for large documents. The upload endpoint returns a job ID immediately while processing continues in the background:

// AsyncIngestionProcessor.java — separate @Component bean
@Async
public void process(String jobId, byte[][] fileBytes, String[] fileNames, Map<String, JobStatus> jobs) {
    // parse, chunk, embed — runs on a separate thread
}

Important: Spring's @Async uses proxy-based AOP. Calling an @Async method from within the same class bypasses the proxy and runs synchronously. That's why the async logic is in a separate @Component bean.

8. Spring Boot Docker Compose Integration

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-docker-compose</artifactId>
</dependency>

This is one of the most powerful features in this project. You don't need to run docker compose up manually. When you start the Spring Boot application with ./mvnw spring-boot:run, Spring Boot:

  1. Detects compose.yaml in the project root
  2. Automatically runs docker compose up to start Ollama and PGVector
  3. Reads the container connection details (ports, credentials)
  4. Auto-configures the datasource, vector store, and Ollama base URL

When the application shuts down, it also stops the Docker containers. This means the entire infrastructure lifecycle is managed by Spring Boot — zero manual Docker commands needed for development.


Project Structure

├── compose.yaml                          # Auto-started by Spring Boot on app launch
├── pom.xml                               # Maven dependencies (Spring AI 2.0, Tika, PGVector)
├── src/main/java/ai/assistant/bot/
│   ├── BotApplication.java               # Spring Boot entry point
│   ├── config/
│   │   ├── AsyncConfig.java              # Enables @Async support
│   │   └── ChunkingConfig.java           # TokenTextSplitter bean configuration
│   ├── controller/
│   │   ├── ChatController.java           # POST /api/chat — RAG chat endpoint
│   │   └── FileUploadController.java     # POST /api/upload — async file ingestion
│   ├── model/
│   │   └── JobStatus.java                # Java record for ingestion job tracking
│   └── service/
│       ├── DataIngestionService.java      # Interface
│       ├── DataIngestionServiceImpl.java  # Orchestrates upload + async handoff
│       └── AsyncIngestionProcessor.java   # @Async document processing
├── src/main/resources/
│   └── application.properties            # All configuration (Ollama, PGVector, chunking)
└── ContextAI/                            # React frontend
    ├── Dockerfile                        # Multi-stage build (Node → Nginx)
    ├── nginx.conf                        # SPA routing + API proxy
    ├── src/
    │   ├── App.jsx                       # State management + routing
    │   ├── components/
    │   │   ├── navbar/                   # Navigation bar
    │   │   ├── hero/                     # Homepage hero panel
    │   │   ├── upload/                   # File upload widget
    │   │   ├── status/                   # Ingestion job status cards
    │   │   └── chat/                     # Chat interface
    │   └── pages/
    │       ├── HomePage.jsx              # Landing page with app description
    │       ├── UploadPage.jsx            # Document upload + status tracking
    │       └── ChatPage.jsx              # Chat with your documents
    └── vite.config.js                    # Dev server proxy to backend

Getting Started

Prerequisites

  • Java 25+ (Amazon Corretto or any JDK)
  • Docker & Docker Compose (for Ollama and PGVector)
  • Node.js 20+ (for frontend development)
  • Maven (or use the included mvnw wrapper)

1. Clone the repository

git clone https://github.com/Siddharthpratapsingh/ContextAI-SpringAI.git
cd ContextAI-SpringAI

2. Start the backend

./mvnw spring-boot:run

This automatically:

  • Detects compose.yaml and runs docker compose up (Ollama + PGVector)
  • Auto-configures datasource and Ollama connections from the running containers
  • Downloads llama3.2:1b model if missing
  • Creates the vector store schema in PostgreSQL
  • Starts the API server on http://localhost:8080

Note: You don't need to run docker compose up separately. Spring Boot manages the entire Docker lifecycle.

3. Start the frontend (development)

cd ContextAI
npm install
npm run dev

Frontend runs at http://localhost:5173 with API calls proxied to the backend.

4. Or build the frontend Docker image separately

docker compose up --build frontend

Frontend available at http://localhost:3000. Note that the backend already manages Ollama and PGVector containers via Spring Boot Docker Compose integration — this command is only needed if you want to run the frontend in a container instead of using npm run dev.


API Endpoints

Method Endpoint Description
POST /api/upload Upload files (multipart). Returns job ID immediately.
GET /api/ingestion/status/{jobId} Check ingestion job status (PROCESSING / COMPLETED / FAILED).
DELETE /api/ingestion/status/{jobId} Remove a completed job from tracking.
POST /api/chat Send a question (plain text body). Returns RAG-grounded answer.

Example: Upload a document

curl -X POST http://localhost:8080/api/upload \
  -F "file=@my-document.pdf"

Response:

{"jobId": "a1b2c3d4-...", "status": "PROCESSING", "message": "Ingestion in progress"}

Example: Chat with your documents

curl -X POST http://localhost:8080/api/chat \
  -H "Content-Type: text/plain" \
  -d "What are the key points in the uploaded document?"

Configuration Reference

All configuration lives in src/main/resources/application.properties:

# Ollama — swap the model to any from https://ollama.com/library
spring.ai.ollama.base-url=http://localhost:11434
spring.ai.ollama.chat.options.model=llama3.2:1b    # Try: llama3.1:8b, mistral:7b, gemma2:9b
spring.ai.ollama.chat.options.num-ctx=2048         # Context window (tokens)
spring.ai.ollama.chat.options.temperature=0.1      # Lower = more factual

# Vector Store
spring.ai.vectorstore.pgvector.initialize-schema=true

# Chunking (tune these for your documents)
rag.chunking.chunk-size=300                        # Tokens per chunk
rag.chunking.min-chunk-size-chars=100              # Minimum characters
rag.chunking.min-chunk-length-to-embed=50          # Skip tiny chunks
rag.chunking.max-num-chunks=5000                   # Max chunks per document

# File Upload
spring.servlet.multipart.max-file-size=50MB
spring.servlet.multipart.max-request-size=50MB

Performance Tuning

Parameter Effect Tradeoff
topK (SearchRequest) Number of chunks retrieved More = richer context but slower LLM inference
similarityThreshold Minimum relevance score (0.0–1.0) Higher = more precise but may miss relevant chunks
chunk-size Tokens per chunk Larger = more context per chunk but less precise retrieval
num-ctx LLM context window Larger = can process more chunks but uses more memory/time
temperature LLM creativity Lower = more factual, higher = more creative

Key Learnings

  1. Spring AI makes RAG simpleQuestionAnswerAdvisor handles the entire retrieve-augment-generate pipeline in one line.
  2. Ollama runs locally — No API keys, no cloud costs, full privacy. Swap models by changing one property (spring.ai.ollama.chat.options.model). Browse available models at ollama.com/library.
  3. @Async needs separate beans — Spring's proxy-based AOP doesn't intercept self-invocations. Always put @Async methods in a different @Component.
  4. Chunk size matters — Too large and retrieval returns irrelevant context; too small and you lose semantic meaning.
  5. Small models need simple promptsllama3.2:1b can't follow complex multi-rule system prompts. Keep instructions short and direct.
  6. Docker Compose integration — Spring Boot auto-detects compose.yaml, starts the containers on app launch, auto-configures connections, and stops them on shutdown. No manual docker compose up needed.

License

MIT


Contributing

Contributions are welcome! Feel free to open issues or submit pull requests.

About

RAG based AI application created using Springboot and SpringAI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors