A fully offline document summarization tool powered by a local AI model. Designed for scientists and researchers who need to quickly summarize academic papers and documents without sending data to external services.
- 100% Offline: After the initial model download, everything runs locally on your machine
- Privacy-First: Documents never leave your computer - no cloud services, no data collection
- Multiple Formats: Supports PDF, DOCX, DOC, RTF, TXT, and Markdown files
- Batch Processing: Summarize entire folders of documents at once
- Flexible Output: Choose between brief, detailed, or structured summaries
- Cross-Platform: Works on Windows, macOS, and Linux
- Adjustable CPU Usage: Control how many CPU threads to use via Settings
| Requirement | Minimum | Recommended |
|---|---|---|
| OS | Windows 10, macOS 10.14, Linux | Latest version |
| RAM | 8 GB | 16 GB |
| Storage | 6 GB free | 10 GB free |
| CPU | 4 cores | 8+ cores |
| Python | 3.10+ | 3.11+ |
Note: The tool runs on CPU by default. GPU acceleration is optional (see Advanced section).
Download the latest release for your platform - no Python required:
- Go to Releases
- Download:
- Windows:
DocSummarizer.exe - Linux:
DocSummarizer
- Windows:
- Run the executable
- On first launch, click "Download Model" (~4.4 GB, one-time)
git clone https://github.com/Wintersta7e/Doc-Summarizer.git
cd Doc-SummarizerWindows:
setup_and_run.batLinux/macOS:
chmod +x setup_and_run.sh
./setup_and_run.shOn first launch, the application will:
- Create a virtual environment
- Install dependencies
- Prompt you to download the AI model (~4.4 GB, one-time)
After setup, the GUI will open automatically.
- Launch with
python run.pyor use the setup script - Click Select File to choose a document
- Select summary type: Brief, Detailed, or Structured
- Click Summarize and wait for processing
- Save the result using Save Summary
# Summarize a single file
python src/cli.py document.pdf
# Choose summary type
python src/cli.py document.pdf -t structured
python src/cli.py document.pdf -t brief
python src/cli.py document.pdf -t detailed
# Save output to file
python src/cli.py document.pdf -o summary.txt
# Batch process a folder
python src/cli.py ./papers/ -o ./summaries/
# Download model only (no processing)
python src/cli.py --download-only| Type | Description | Best For |
|---|---|---|
| Brief | 1 paragraph (3-5 sentences) | Quick overview |
| Detailed | Comprehensive with key points | Understanding content |
| Structured | Organized sections (Purpose, Methods, Conclusions, etc.) | Academic papers |
DocSummarizer/
├── run.py # Main entry point (GUI)
├── requirements.txt # Python dependencies
├── README.md # This file
├── DEVELOPMENT.md # Developer documentation
├── DocSummarizer.spec # PyInstaller build configuration
├── setup_and_run.bat # Windows launcher
├── setup_and_run.sh # Linux/macOS launcher
├── .gitignore # Git ignore rules
└── src/
├── __init__.py
├── gui.py # GUI application (CustomTkinter)
├── cli.py # Command-line interface
├── document_parser.py # Document text extraction
├── model_manager.py # LLM download and inference
└── logger.py # Logging and diagnostics
- Document Parsing: Extracts text from PDF, DOCX, and other formats using
pypdfandpython-docx - Text Processing: Prepares the extracted text for the AI model
- Local LLM Inference: Uses
llama-cpp-pythonto run a quantized Mistral 7B model - Summary Generation: The model generates a summary based on the selected type
| Property | Value |
|---|---|
| Model | Mistral 7B Instruct v0.2 |
| Quantization | Q4_K_M (4-bit) |
| Size | ~4.4 GB |
| Source | HuggingFace (TheBloke) |
| Context Window | 8192 tokens |
The model is downloaded on first launch and stored in:
- Windows:
%LOCALAPPDATA%\DocSummarizer\models\ - macOS:
~/Library/Application Support/DocSummarizer/models/ - Linux:
~/.local/share/DocSummarizer/models/
| Document Size | Processing Time (CPU) |
|---|---|
| Short (1-5 pages) | 30-60 seconds |
| Medium (5-15 pages) | 1-2 minutes |
| Long (15+ pages) | 2-3 minutes |
Note: Times vary based on CPU and thread settings. By default, uses half of available CPU cores to balance speed and system responsiveness. Adjust in Settings > CPU Threads if needed.
- Check internet connection
- Ensure 5+ GB free disk space
- Try running as administrator
- Close other applications
- Ensure at least 8 GB RAM
- Process smaller documents
- Normal on CPU - the model is computationally intensive
- Increase CPU threads in Settings for faster processing
- Close other applications to free resources
- Consider GPU acceleration (see DEVELOPMENT.md)
- Go to Settings > CPU Threads and lower the thread count
- Using fewer threads reduces CPU load but increases processing time
Log files are stored at:
- Windows:
%LOCALAPPDATA%\DocSummarizer\logs\ - Linux:
~/.local/share/DocSummarizer/logs/
Logs contain startup info, performance metrics, and error details (no document content is logged).
- Some scanned PDFs (image-only) cannot be parsed
- Password-protected PDFs are not supported
- Try converting to DOCX first
- No internet required after model download
- No telemetry or usage tracking
- No data collection - documents processed in memory only
- Open source - audit the code yourself
MIT License - See LICENSE file for details.
- llama.cpp - Efficient LLM inference
- Mistral AI - Base model
- TheBloke - Quantized models
- CustomTkinter - Modern GUI toolkit