Translation Memory Tester for Machine Translation

A CLI tool for testing translation memory (TM) systems by generating test data, creating TMX files, and calculating Levenshtein-based match scores.

Core idea

Many machine translation tools (e.g. DeepL Translate) now support the integration of translation memories. If there is a match in the source text, the MT system will prefer the human-validated translation from the translation memory over the AI-generated version. This improves consistency and quality of the translation, while keeping the flexibility of AI translation when there's no match.
But how well does this work in practice? This tool lets you find out. Here's the basic flow:

Generate text in English, translate it to German.
Build a translation memory based on these texts and the TMX standard.
Create a variant of the source text
Upload the translation memory to your machine translation service. They all accept the TMX format. Example: DeepL lets you upload TMX files in the customization hub.
Translate the source text variant with the MT service.
Evaluate how well the TM integration works in the MT output. Compare with the matching report generated by this tool to see if matches were inserted as expected.

You will probably notice big differences, usually caused by segmentation misalignment. This tool allows you to experiment with different segmentation approaches to see what works better.

Features

Text Generation: Generate English source text about electric vehicles in cities using Claude API
Smart Segmentation: Segment text into TM-style units (short or long mode)
German Translation: Translate segments to casual, fluent German
TMX Export: Export to TMX 1.4 format for import into OmegaT, DeepL, memoQ, etc.
Variation Generation: Create test variations targeting specific match percentages (100%, 95-99%, 85-94%, 70-84%, 50-69%, 0-49%)
Match Scoring: Calculate Levenshtein-based similarity scores
Reports: Generate JSON and HTML reports with diff highlighting

Installation

Prerequisites

Python 3.10+
UV package manager (recommended)

Setup

# Clone the repository
git clone https://github.com/rhofkens/translation-memory-tester.git
cd translation-memory-tester

# Create virtual environment and install
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

API Key Configuration

Set your Anthropic API key:

export ANTHROPIC_API_KEY='your-api-key'

Quick Start

Run the complete pipeline with a single command:

tmtest run-all --output-dir ./my-test

This will:

Generate ~500 words of English text about EVs
Segment the text into TM units
Translate to German (casual style)
Export to TMX format
Generate variations for different match levels
Calculate match scores and generate reports

Commands

`tmtest run-all`

Run the complete pipeline.

tmtest run-all --output-dir ./output --verbose --segment-mode short

Options:

--output-dir, -o: Output directory (default: ./output)
--verbose, -V: Show detailed output
--segment-mode, -m: short for better TM reuse, long for full sentences

`tmtest generate`

Generate English source text.

tmtest generate --output source.json

`tmtest segment`

Segment source text into TM units.

tmtest segment source.json --output segments.json --mode short

Options:

--mode, -m: short (splits at conjunctions) or long (full sentences)

`tmtest translate`

Translate segments to German.

tmtest translate segments.json --output translated.json

`tmtest validate`

Validate German translations for coherence.

tmtest validate translated.json
tmtest validate translated.json --fix --output fixed.json

`tmtest export-tmx`

Export to TMX format.

tmtest export-tmx translated.json --output memory.tmx

`tmtest variate`

Generate variations for different match percentages.

tmtest variate segments.json --output variations.json

`tmtest match`

Match variations against TM and generate reports.

# JSON report
tmtest match variations.json --tm memory.tmx --output report.json

# HTML report with diff highlighting
tmtest match variations.json --tm memory.tmx --output report.html --format html

Output Files

File	Description
`source.json`	Generated English text with word count
`segments.json`	Segmented text with segment IDs
`translated.json`	Segments with German translations
`memory.tmx`	TMX file for import into TM systems
`variations.json`	Test variations tagged by intended match category
`report.json`	Match results with scores and statistics
`report.html`	Visual report with diff highlighting

Match Categories

Category	Score Range	Variation Strategy
Exact	100%	Identical segments
Near-exact	95-99%	Punctuation/capitalization changes
High Fuzzy	85-94%	Synonym substitutions
Medium Fuzzy	70-84%	Phrase-level changes
Low Fuzzy	50-69%	Significant rewrites
No Match	0-49%	New content

TMX Compatibility

The generated TMX files are compatible with:

OmegaT
DeepL Translation Memory
memoQ
SDL Trados
Any TMX 1.4-compliant system

Development

# Install dev dependencies
uv pip install -e ".[dev]"

# Run tests
pytest

# Format code
ruff format src/
ruff check src/ --fix

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.claude		.claude
.github/workflows		.github/workflows
docs		docs
src/translation_memory_tester		src/translation_memory_tester
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Translation Memory Tester for Machine Translation

Core idea

Features

Installation

Prerequisites

Setup

API Key Configuration

Quick Start

Commands

`tmtest run-all`

`tmtest generate`

`tmtest segment`

`tmtest translate`

`tmtest validate`

`tmtest export-tmx`

`tmtest variate`

`tmtest match`

Output Files

Match Categories

TMX Compatibility

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Translation Memory Tester for Machine Translation

Core idea

Features

Installation

Prerequisites

Setup

API Key Configuration

Quick Start

Commands

tmtest run-all

tmtest generate

tmtest segment

tmtest translate

tmtest validate

tmtest export-tmx

tmtest variate

tmtest match

Output Files

Match Categories

TMX Compatibility

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`tmtest run-all`

`tmtest generate`

`tmtest segment`

`tmtest translate`

`tmtest validate`

`tmtest export-tmx`

`tmtest variate`

`tmtest match`

Packages