DocSummarizer

A fully offline document summarization tool powered by a local AI model. Designed for scientists and researchers who need to quickly summarize academic papers and documents without sending data to external services.

Features

100% Offline: After the initial model download, everything runs locally on your machine
Privacy-First: Documents never leave your computer - no cloud services, no data collection
Multiple Formats: Supports PDF, DOCX, DOC, RTF, TXT, and Markdown files
Batch Processing: Summarize entire folders of documents at once
Flexible Output: Choose between brief, detailed, or structured summaries
Cross-Platform: Works on Windows, macOS, and Linux
Adjustable CPU Usage: Control how many CPU threads to use via Settings

System Requirements

Requirement	Minimum	Recommended
OS	Windows 10, macOS 10.14, Linux	Latest version
RAM	8 GB	16 GB
Storage	6 GB free	10 GB free
CPU	4 cores	8+ cores
Python	3.10+	3.11+

Note: The tool runs on CPU by default. GPU acceleration is optional (see Advanced section).

Quick Start

Option A: Download Standalone Executable (Easiest)

Download the latest release for your platform - no Python required:

Go to Releases
Download:
- Windows: DocSummarizer.exe
- Linux: DocSummarizer
Run the executable
On first launch, click "Download Model" (~4.4 GB, one-time)

Option B: Run from Source

1. Clone or Download

git clone https://github.com/Wintersta7e/Doc-Summarizer.git
cd Doc-Summarizer

2. Run Setup Script

Windows:

setup_and_run.bat

Linux/macOS:

chmod +x setup_and_run.sh
./setup_and_run.sh

3. First Launch

On first launch, the application will:

Create a virtual environment
Install dependencies
Prompt you to download the AI model (~4.4 GB, one-time)

After setup, the GUI will open automatically.

Usage

Graphical Interface (GUI)

Launch with python run.py or use the setup script
Click Select File to choose a document
Select summary type: Brief, Detailed, or Structured
Click Summarize and wait for processing
Save the result using Save Summary

Command Line Interface (CLI)

# Summarize a single file
python src/cli.py document.pdf

# Choose summary type
python src/cli.py document.pdf -t structured
python src/cli.py document.pdf -t brief
python src/cli.py document.pdf -t detailed

# Save output to file
python src/cli.py document.pdf -o summary.txt

# Batch process a folder
python src/cli.py ./papers/ -o ./summaries/

# Download model only (no processing)
python src/cli.py --download-only

Summary Types

Type	Description	Best For
Brief	1 paragraph (3-5 sentences)	Quick overview
Detailed	Comprehensive with key points	Understanding content
Structured	Organized sections (Purpose, Methods, Conclusions, etc.)	Academic papers

Project Structure

DocSummarizer/
├── run.py                  # Main entry point (GUI)
├── requirements.txt        # Python dependencies
├── README.md               # This file
├── DEVELOPMENT.md          # Developer documentation
├── DocSummarizer.spec      # PyInstaller build configuration
├── setup_and_run.bat       # Windows launcher
├── setup_and_run.sh        # Linux/macOS launcher
├── .gitignore              # Git ignore rules
└── src/
    ├── __init__.py
    ├── gui.py              # GUI application (CustomTkinter)
    ├── cli.py              # Command-line interface
    ├── document_parser.py  # Document text extraction
    ├── model_manager.py    # LLM download and inference
    └── logger.py           # Logging and diagnostics

How It Works

Document Parsing: Extracts text from PDF, DOCX, and other formats using pypdf and python-docx
Text Processing: Prepares the extracted text for the AI model
Local LLM Inference: Uses llama-cpp-python to run a quantized Mistral 7B model
Summary Generation: The model generates a summary based on the selected type

Model Information

Property	Value
Model	Mistral 7B Instruct v0.2
Quantization	Q4_K_M (4-bit)
Size	~4.4 GB
Source	HuggingFace (TheBloke)
Context Window	8192 tokens

The model is downloaded on first launch and stored in:

Windows: %LOCALAPPDATA%\DocSummarizer\models\
macOS: ~/Library/Application Support/DocSummarizer/models/
Linux: ~/.local/share/DocSummarizer/models/

Performance

Document Size	Processing Time (CPU)
Short (1-5 pages)	30-60 seconds
Medium (5-15 pages)	1-2 minutes
Long (15+ pages)	2-3 minutes

Note: Times vary based on CPU and thread settings. By default, uses half of available CPU cores to balance speed and system responsiveness. Adjust in Settings > CPU Threads if needed.

Troubleshooting

Model download fails

Check internet connection
Ensure 5+ GB free disk space
Try running as administrator

Out of memory

Close other applications
Ensure at least 8 GB RAM
Process smaller documents

Slow performance

Normal on CPU - the model is computationally intensive
Increase CPU threads in Settings for faster processing
Close other applications to free resources
Consider GPU acceleration (see DEVELOPMENT.md)

High CPU usage

Go to Settings > CPU Threads and lower the thread count
Using fewer threads reduces CPU load but increases processing time

Checking logs for errors

Log files are stored at:

Windows: %LOCALAPPDATA%\DocSummarizer\logs\
Linux: ~/.local/share/DocSummarizer/logs/

Logs contain startup info, performance metrics, and error details (no document content is logged).

PDF extraction issues

Some scanned PDFs (image-only) cannot be parsed
Password-protected PDFs are not supported
Try converting to DOCX first

Privacy & Security

No internet required after model download
No telemetry or usage tracking
No data collection - documents processed in memory only
Open source - audit the code yourself

License

MIT License - See LICENSE file for details.

Acknowledgments

llama.cpp - Efficient LLM inference
Mistral AI - Base model
TheBloke - Quantized models
CustomTkinter - Modern GUI toolkit

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocSummarizer

Features

System Requirements

Quick Start

Option A: Download Standalone Executable (Easiest)

Option B: Run from Source

1. Clone or Download

2. Run Setup Script

3. First Launch

Usage

Graphical Interface (GUI)

Command Line Interface (CLI)

Summary Types

Project Structure

How It Works

Model Information

Performance

Troubleshooting

Model download fails

Out of memory

Slow performance

High CPU usage

Checking logs for errors

PDF extraction issues

Privacy & Security

License

Acknowledgments

About

Uh oh!

Releases 2

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
src		src
.gitignore		.gitignore
DEVELOPMENT.md		DEVELOPMENT.md
DocSummarizer.spec		DocSummarizer.spec
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py
setup_and_run.bat		setup_and_run.bat
setup_and_run.sh		setup_and_run.sh

License

Wintersta7e/Doc-Summarizer

Folders and files

Latest commit

History

Repository files navigation

DocSummarizer

Features

System Requirements

Quick Start

Option A: Download Standalone Executable (Easiest)

Option B: Run from Source

1. Clone or Download

2. Run Setup Script

3. First Launch

Usage

Graphical Interface (GUI)

Command Line Interface (CLI)

Summary Types

Project Structure

How It Works

Model Information

Performance

Troubleshooting

Model download fails

Out of memory

Slow performance

High CPU usage

Checking logs for errors

PDF extraction issues

Privacy & Security

License

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages