Skip to content

Rogaton/TraductAL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TraductAL - Multilingual, Multimodal Translation System

A neuro-symbolic approach combining neural MT with Prolog-based validation

65+ languages β€’ 100% offline β€’ Privacy-focused β€’ No data collection

A multilingual translation system that runs entirely on your computer. No internet required after setup, no data sent anywhere.


πŸ“‹ License: MIT License - Free for all uses β€’ Support development


🎯 What It Does

  • Translates text between 65+ languages
  • Works completely offline after initial setup
  • Supports mainstream languages (English, French, German, Spanish, Russian, Chinese, Arabic, etc.)
  • Supports low-resource languages (Romansh dialects, Celtic languages, etc.)
  • Optional speech-to-text and text-to-speech
  • Web interface + command-line tools

πŸš€ Quick Start

# 1. Install
git clone https://github.com/Rogaton/TraductAL
cd TraductAL
pip install -r requirements.txt

# 2. Download models (one-time, ~3-10GB)
python download_nllb_200.py

# 3. Launch web interface
./start_gradio.sh

# Open browser to http://localhost:7860

🌍 Supported Languages

50 Mainstream Languages (via NLLB-200):

  • European: English, French, German, Italian, Spanish, Portuguese, Dutch, Polish, Swedish, Danish, Norwegian, Finnish, Greek, Turkish, Romanian, Czech, Hungarian, and more
  • World: Russian, Chinese, Hindi, Arabic, Japanese, Korean
  • Asian: Vietnamese, Thai, Indonesian, Malay, Tamil, Bengali, Urdu, Persian, Hebrew
  • African: Swahili, Amharic, Hausa, Yoruba
  • Regional: Catalan, Galician, Basque, Ukrainian, Bulgarian, Serbian, Croatian, and more

15+ Low-Resource Languages (via Apertus-8B):

  • Romansh: All 6 variants (Sursilvan, Vallader, Puter, Surmiran, Sutsilvan, Rumantsch Grischun)
  • Celtic: Welsh, Scottish Gaelic, Irish, Breton
  • Regional: Occitan, Luxembourgish, Friulian, Ladin, Sardinian

πŸ”’ Privacy & Offline

  • 100% offline after initial model download
  • No data collection - everything stays on your machine
  • No internet required for translation
  • Perfect for confidential documents

⚑ Usage

Web Interface (Recommended)

./start_gradio.sh
# Open http://localhost:7860

Command Line

# Simple translation
./translate_enhanced.sh en fr "Hello, how are you?"

# Output: Bonjour, comment allez-vous?

Python API

from unified_translator import UnifiedTranslator

translator = UnifiedTranslator()
result = translator.translate("Hello world", "en", "fr")
print(result["translation"])  # Bonjour le monde

πŸ’» System Requirements

Minimum:

  • Python 3.8+
  • 8GB RAM
  • 5GB disk space

Recommended:

  • Python 3.10+
  • 16GB RAM
  • 10GB disk space
  • GPU optional (faster with GPU)

⚠️ Important Notes

  • Development software: Use at your own risk
  • Translation quality varies by language pair
  • Not for critical use: Professional translation may require human review
  • First run is slow: Models download automatically (~3-10GB)

πŸ“š Documentation

  • Full technical documentation: See docs/README_DETAILED.md for complete details
  • Adding languages: See docs/ADD_LANGUAGES_GUIDE.md
  • Batch translation: See docs/BATCH_TRANSLATION_EXAMPLES.md
  • Audio features: See docs/MULTIMODAL_GUIDE.md
  • Architecture & integration: See docs/INTEGRATION_ARCHITECTURE.md
  • Prolog validation: See docs/DCG_PARSER_SUMMARY.md
  • All documentation: Browse the docs/ directory

πŸ› οΈ Neuro-Symbolic Architecture

TraductAL combines neural and symbolic approaches:

Neural Translation Engines

  1. NLLB-200 (Meta): Fast, accurate, 200+ languages
  2. Apertus-8B: Specialized for low-resource languages (1811 languages)

Symbolic Validation Layer

  1. Trealla-Prolog: Dependency grammar parser for glossary validation
    • Checks and corrects potential neural model errors
    • Uses Prolog-based lexicon and grammar rules
    • Helps prevent hallucinations from NLLB-200 and Apertus LLMs

The system automatically picks the best model for your language pair and validates outputs through the symbolic layer.

πŸŽ“ Academic Use

See AUTHORSHIP_AND_ATTRIBUTION.md for citation guidelines and transparency about AI-assisted development.

πŸ“‚ Project Structure

TraductAL/
β”œβ”€β”€ README.md                    # This file - user guide
β”œβ”€β”€ QUICKSTART.md               # Quick start guide
β”œβ”€β”€ LICENSE                     # MIT License
β”œβ”€β”€ AUTHORSHIP_AND_ATTRIBUTION.md  # Academic citations
β”œβ”€β”€ requirements.txt            # Core dependencies
β”œβ”€β”€ requirements_enhanced.txt   # Optional features (STT/TTS)
β”‚
β”œβ”€β”€ Core Application Files
β”‚   β”œβ”€β”€ gradio_app.py          # Main web interface (65+ languages)
β”‚   β”œβ”€β”€ unified_translator.py   # Unified translation engine
β”‚   β”œβ”€β”€ nllb_translator.py     # NLLB-200 engine
β”‚   β”œβ”€β”€ apertus_translator.py  # Apertus-8B engine
β”‚   β”œβ”€β”€ apertus_trealla_hybrid.py  # Hybrid neural-symbolic
β”‚   β”œβ”€β”€ whisper_stt.py         # Speech-to-text
β”‚   β”œβ”€β”€ tts_engine.py          # Text-to-speech
β”‚   └── startup_check.py       # System verification
β”‚
β”œβ”€β”€ Scripts
β”‚   β”œβ”€β”€ start_gradio.sh        # Launch web interface
β”‚   β”œβ”€β”€ translate_enhanced.sh  # CLI translation
β”‚   └── download_nllb_200.py   # Download models
β”‚
β”œβ”€β”€ glossary_parser/           # Prolog DCG parser (linguistic)
β”œβ”€β”€ docs/                      # All documentation (40+ files)
β”œβ”€β”€ scripts/                   # Utility scripts & training
β”œβ”€β”€ data/samples/              # Test data & samples
└── docker/                    # Docker configuration

πŸ“„ License

Dual Licensing Options

TraductAL is available under dual licensing to serve both academic and commercial needs:

πŸŽ“ MIT License (Academic & Non-Commercial)

FREE for:

  • Universities and research institutions
  • Non-profit organizations
  • Personal use and experimentation
  • Startups with revenue < $100,000 USD
  • Open-source projects

See LICENSE for full terms.

πŸ’ Supporting Development

TraductAL is developed by an independent researcher. If you find it useful:

Optional donations help support continued development. See SUPPORT.md for options.

Academic collaboration: Contact relanir@bluewin.ch for research partnerships.

Third-Party Model Licenses

TraductAL integrates open-source models with their own licenses:

  • NLLB-200: CC-BY-NC 4.0 (non-commercial only) - see COMMERCIAL_LICENSE.md for commercial alternatives
  • Apertus-8B: Apache 2.0 (commercial use permitted)

Need the full technical documentation? See docs/README_DETAILED.md for complete details.