NEL Demo - spaCy NER+NEL GUI

A simple demonstration application for Named Entity Recognition (NER) and Named Entity Linking (NEL) using spaCy models with a minimal GUI interface.

Features

✅ Easy Installation: Automated installers for Windows (PowerShell) and Linux/Mac (Bash)
✅ Python Version Check: Ensures Python 3.10 or higher is installed
✅ Virtual Environment: Automatically creates and manages a virtual environment
✅ Flexible Dependencies: Choose between standard spaCy or spacy-transformers
✅ Simple GUI: User-friendly interface built with tkinter
✅ Model Management: Load custom trained models from the models/ directory
✅ Text Processing: Process any text and extract named entities
✅ Cyrillic Transliteration: Automatic transliteration from Cyrillic to Latin script for better NER accuracy
✅ Smart Text Chunking: Automatically handles large texts by chunking on paragraph boundaries
✅ Visual Output: Generate beautiful HTML visualizations using displaCy
✅ Output Management: Save all outputs to data/outputs/ with timestamps
✅ Comprehensive Testing: Full test suite with pytest

Project Structure

NEL_Demo/
├── install.ps1              # Windows installer (PowerShell)
├── install.sh               # Linux/Mac installer (Bash)
├── requirements.txt         # Python dependencies
├── README.md               # This file
├── src/
│   ├── gui.py              # Main GUI application
│   └── text_chunker.py     # Text chunking module for large documents
├── tests/
│   └── test_text_chunker.py # Test suite for text chunking
├── models/                 # Place your trained models here
│   └── {model_name}/
│       └── model-best/     # Your trained spaCy model
├── inputs/                 # Input text files
│   └── sample_text.txt     # Sample text file
├── data/
│   └── outputs/            # HTML visualization outputs
└── venv/                   # Virtual environment (created by installer)

Requirements

Python: 3.10 or higher
Operating System: Windows, Linux, or macOS
spaCy Model: A trained spaCy model placed in models/{model_name}/model-best/

Installation

Windows (PowerShell)

Open PowerShell
Navigate to the project directory
Run the installer:

.\install.ps1

Linux/Mac (Bash)

Open a terminal
Navigate to the project directory
Run the installer:

./install.sh

What the Installer Does

The installer will:

✅ Check if Python 3.10+ is installed
✅ Create a virtual environment in venv/
✅ Activate the virtual environment
✅ Upgrade pip to the latest version
✅ Ask you to choose between:
- Standard spaCy (faster, smaller)
- spacy-transformers (more accurate, larger)
✅ Install all required dependencies

Setting Up a Model

Pre-installed Model

A Serbian NER+NEL model (trsic4-CNN-ner-nel) is already installed in the models/ directory and ready to use. No additional setup is required!

Using Your Own Trained Model

If you have a trained spaCy model:

Create a directory: models/{your_model_name}/
Place your trained model in: models/{your_model_name}/model-best/

The structure should look like:

models/
└── your_model_name/
    └── model-best/
        ├── config.cfg
        ├── meta.json
        ├── tokenizer
        ├── ner/
        └── ... (other model files)

Usage

Starting the Application

Windows:

.\venv\Scripts\Activate.ps1
python src/gui.py

Linux/Mac:

source venv/bin/activate
python src/gui.py

Using the GUI

Select a Model:
- Choose your model from the dropdown
- Click "Load Model" to load it
- Wait for the confirmation message
Configure Processing Options:
- Transliterate Cyrillic to Latin: Enabled by default (if cyrtranslit is installed)
- This option automatically converts Cyrillic text to Latin before processing for better entity recognition
Enter Text:
- Type or paste text into the input area
- Or click "Load Sample Text" for a demo
- Or click "Load from File" to load a text file from the inputs/ folder
Process Text:
- Click "Process Text (NER)" to analyze the text
- View entities in the results section
- HTML visualization is automatically saved
View Results:
- Click "View Last Output" to open the HTML in your browser
- Click "Open Output Folder" to see all saved outputs

Cyrillic Transliteration Feature

The application includes automatic Cyrillic-to-Latin transliteration to improve NER accuracy when using models trained primarily on Latin script:

Automatic Conversion: Converts Cyrillic text to Latin script (supports Serbian, Montenegrin, Macedonian, Russian, Ukrainian, Kazakh, and Bulgarian) before processing
Enabled by Default: The transliteration option is checked by default (if cyrtranslit is installed)
Toggleable: Can be disabled via the checkbox if you prefer to process Cyrillic text directly
Preserves Entities: Latin text remains unchanged; only Cyrillic characters are transliterated
Better Accuracy: Models trained on Latin script typically perform better with transliterated text

Example: The Cyrillic text "Новак Ђоковић рођен у Београду" is automatically transliterated to "Novak Đoković rođen u Beogradu" before being sent to the NER pipeline.

Note: If you have a model specifically trained on Cyrillic text, you can disable this option by unchecking the "Transliterate Cyrillic to Latin before processing" checkbox.

Example

Try this sample text:

Apple Inc. is an American multinational technology company headquartered 
in Cupertino, California. Tim Cook is the CEO of Apple. The company was 
founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in 1976.

The application will:

Extract entities like "Apple Inc." (ORG), "Tim Cook" (PERS), "Cupertino" (LOC)
Link entities to Wikidata (NEL) with Q-IDs where available
Show entity labels and positions
Generate an HTML visualization with highlighted entities
Save the output to data/outputs/ner_output_YYYYMMDD_HHMMSS.html

Note: The Serbian NER+NEL model (trsic4-CNN-ner-nel) recognizes these entity types: PERS (person), LOC (location), ORG (organization), EVENT, DEMO (demonym), IDEO (ideology), PRODUCT, ROLE, and WORK.

Text Processing with Paragraph Chunking

The application automatically uses chunking for any text with multiple paragraphs:

Smart Chunking: Paragraphs are grouped into appropriately sized chunks (up to 100K chars each) to preserve logical structure and improve NER accuracy
Automatic Processing: Each chunk is processed separately with spaCy NER
Merged Output: All chunks are combined into a single HTML visualization
Visual Separation: Section breaks are added between chunks in the output
Better Context: Processing text with paragraph boundaries helps spaCy maintain clearer context for entity recognition

Single-paragraph texts are processed normally without chunking overhead. This approach ensures optimal NER performance while maintaining the readability and structure of the original text.

Output Format

Each processed text generates an HTML file with:

Original text with highlighted entities
Color-coded entity types
Interactive visualization
Timestamp in the filename

Output files are saved in: data/outputs/

Troubleshooting

"Python is not installed or not in PATH"

Install Python 3.10 or higher from python.org
Make sure to check "Add Python to PATH" during installation

"No models found"

Make sure you've placed a trained model in models/{model_name}/model-best/
Check that the model directory structure is correct
Try downloading a pre-trained model (see "Setting Up a Model")

"Error loading model"

Verify the model files are complete and not corrupted
Make sure the model is compatible with your spaCy version
Try re-downloading or re-training the model

GUI doesn't start

Make sure you've activated the virtual environment
Check that all dependencies are installed: pip list
On Linux, you may need to install tkinter: sudo apt-get install python3-tk

Advanced Usage

Training Your Own Model

To train a custom NER+NEL model with spaCy:

Prepare your training data
Create a spaCy project or config

Train the model:

python -m spacy train config.cfg --output ./models/my_model

The trained model will be in models/my_model/model-best/

For more information, see the spaCy training documentation.

Using Transformer Models

For better accuracy, use transformer-based models:

Install spacy-transformers during setup (option 2)
Train or download a transformer model
Place it in the models directory

Note: Transformer models are larger and slower but more accurate.

Dependencies

Core dependencies (installed automatically):

spacy>=3.7.0 - Core NLP library
cyrtranslit>=1.0.0 - Cyrillic-to-Latin transliteration
tkinter-tooltip>=2.0.0 - GUI tooltips (optional)

Optional:

spacy-transformers - For transformer-based models

Development dependencies:

pytest - For running tests

Testing

The project includes comprehensive tests for the text chunking functionality.

To run the tests:

# Activate the virtual environment first
# Windows:
.\venv\Scripts\Activate.ps1

# Linux/Mac:
source venv/bin/activate

# Install pytest (if not already installed)
pip install pytest

# Run all tests
python -m pytest tests/test_text_chunker.py -v

# Run specific test class
python -m pytest tests/test_text_chunker.py::TestChunkText -v

The test suite includes:

Paragraph splitting tests: Verify correct handling of various paragraph formats
Text chunking tests: Ensure proper chunking at different size limits
HTML merging tests: Validate correct merging of multiple HTML outputs
Edge case tests: Test Unicode, special characters, very long sentences
Integration tests: End-to-end workflow validation

License

This project is dedicated to the public domain under CC0 1.0 Universal (CC0 1.0) Public Domain Dedication - see the LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

For issues or questions:

Check the troubleshooting section
Visit spaCy documentation
Open an issue on GitHub

Acknowledgments

Built with spaCy
Visualization powered by displaCy
GUI built with Python's tkinter

Made by:

Happy Entity Recognition! 🎯

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.github/workflows		.github/workflows
inputs		inputs
models		models
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
README.sr.md		README.sr.md
install.ps1		install.ps1
install.sh		install.sh
integration_test.py		integration_test.py
manual_test.py		manual_test.py
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

License

te-sla/NEL_Demo

Folders and files

Latest commit

History

Repository files navigation

NEL Demo - spaCy NER+NEL GUI

Features

Project Structure

Requirements

Installation

Windows (PowerShell)

Linux/Mac (Bash)

What the Installer Does

Setting Up a Model

Pre-installed Model

Using Your Own Trained Model

Usage

Starting the Application

Using the GUI

Cyrillic Transliteration Feature

Example

Text Processing with Paragraph Chunking

Output Format

Troubleshooting

"Python is not installed or not in PATH"

"No models found"

"Error loading model"

GUI doesn't start

Advanced Usage

Training Your Own Model

Using Transformer Models

Dependencies

Testing

License

Contributing

Support

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Uh oh!

Languages

Packages