Second Brain — PDF Textbook Search via MCP Server

Overview

An MCP server that gives an AI agent direct search and retrieval access to a library of PDF textbooks. No vector database, no embeddings, no preprocessing — just raw PDFs and intelligent search.

The agent uses query expansion (generating multiple synonym/related search phrases) to overcome keyword search limitations, then retrieves and synthesizes relevant passages with proper citations.

Architecture

User Question
      │
      ▼
┌─────────────┐
│   AI Agent   │  ← Generates ~5 search phrases from user's question
└─────┬───────┘
      │ MCP tool calls
      ▼
┌─────────────────────┐
│   MCP Server        │
│                     │
│  Tools:             │
│  - search_books()   │
│  - list_books()     │
│  - list_chapters()  │
│  - get_chapter()    │
│  - get_section()    │
│  - get_page_range() │
│                     │
│  Data: /books/*.pdf │
└─────────────────────┘

MCP Tools

`list_books()`

Returns all available PDF filenames/titles.

`list_chapters(book: string)`

Returns the chapter titles and page ranges for a given book (extracted from PDF bookmarks/TOC).

`search_books(queries: string[])`

Keyword search across all PDFs using multiple queries simultaneously. Returns snippets (a few sentences of surrounding context) with book name, page number, and location for each hit. This is the primary entry point — the agent generates several synonym/related phrases to cast a wide net.

`get_chapter(book: string, chapter: string | int)`

Returns the full text of a chapter. Use when the agent needs broad context on a topic.

`get_section(book: string, chapter: string | int, section: string | int)`

Returns a specific section within a chapter. Use when the agent knows exactly where to look.

`get_page_range(book: string, start_page: int, end_page: int)`

Returns the text of a specific page range. Use for grabbing context around a search hit.

Agent Workflow

User asks a question
Query expansion — Agent generates ~5 search terms (synonyms, related phrases, alternate terminology) to handle vocabulary mismatch
Search — Agent calls search_books() with all terms
Drill down — Based on snippet results, agent uses get_section() or get_page_range() to pull full context where needed
Synthesize — Agent combines the retrieved text into an answer
Cite — Agent quotes relevant passages with book name and page number

Example

User: "How does backpropagation work?"

Agent generates queries:

"backpropagation"
"backward pass gradient"
"chain rule neural network"
"error propagation"
"gradient computation layers"

Agent searches → finds hits → reads relevant sections → answers with quotes and page citations.

Design Decisions

Decision	Rationale
No vector database	Eliminates infrastructure, embedding costs, and preprocessing. PDF search is computationally cheap.
No pre-generated summaries	Book titles are sufficient for the agent to know where to search. Keeps the system zero-config.
Query expansion over embeddings	Multiple keyword searches cover synonym/terminology gaps without needing semantic similarity.
Hierarchical retrieval tools	Mirrors how textbooks are organized (book → chapter → section → page). Lets the agent choose the right granularity.
MCP protocol	Any MCP-compatible client (Claude Desktop, custom apps, Claude Code) can connect without custom UI.

Tech Stack

MCP SDK — Python (mcp) or TypeScript (@modelcontextprotocol/sdk)
PDF extraction — PyMuPDF (fitz) for text extraction and keyword search
No other dependencies — no database, no embedding model, no preprocessing pipeline

File Structure

second-brain/
├── server.py          # MCP server with tool definitions
├── pdf_tools.py       # PDF search and extraction logic
├── books/             # Drop PDF textbooks here
│   ├── deep_learning.pdf
│   ├── linear_algebra.pdf
│   └── ...
├── requirements.txt   # pymupdf, mcp
└── README.md

Getting Started

Drop PDFs into books/
Run the MCP server
Connect any MCP client
Ask questions

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
books		books
.gitignore		.gitignore
DOCKER.md		DOCKER.md
DOCKER_QUICKREF.md		DOCKER_QUICKREF.md
Dockerfile		Dockerfile
README.md		README.md
RECOMMENDATIONS.md		RECOMMENDATIONS.md
SETUP.md		SETUP.md
TOKEN_TRACKING.md		TOKEN_TRACKING.md
docker-compose.yml		docker-compose.yml
pdf_tools.py		pdf_tools.py
requirements.txt		requirements.txt
server.py		server.py
test_setup.py		test_setup.py
usage_logger.py		usage_logger.py
view_usage.py		view_usage.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Second Brain — PDF Textbook Search via MCP Server

Overview

Architecture

MCP Tools

`list_books()`

`list_chapters(book: string)`

`search_books(queries: string[])`

`get_chapter(book: string, chapter: string | int)`

`get_section(book: string, chapter: string | int, section: string | int)`

`get_page_range(book: string, start_page: int, end_page: int)`

Agent Workflow

Example

Design Decisions

Tech Stack

File Structure

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Second Brain — PDF Textbook Search via MCP Server

Overview

Architecture

MCP Tools

list_books()

list_chapters(book: string)

search_books(queries: string[])

get_chapter(book: string, chapter: string | int)

get_section(book: string, chapter: string | int, section: string | int)

get_page_range(book: string, start_page: int, end_page: int)

Agent Workflow

Example

Design Decisions

Tech Stack

File Structure

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`list_books()`

`list_chapters(book: string)`

`search_books(queries: string[])`

`get_chapter(book: string, chapter: string | int)`

`get_section(book: string, chapter: string | int, section: string | int)`

`get_page_range(book: string, start_page: int, end_page: int)`

Packages