Skip to content

fix: tighten German normalization edge cases#1

Open
apples-kksk wants to merge 2 commits into
semidark:feat/german-g2p-upstreamfrom
apples-kksk:fix-german-normalization-edge-cases
Open

fix: tighten German normalization edge cases#1
apples-kksk wants to merge 2 commits into
semidark:feat/german-g2p-upstreamfrom
apples-kksk:fix-german-normalization-edge-cases

Conversation

@apples-kksk
Copy link
Copy Markdown

@apples-kksk apples-kksk commented May 12, 2026

This is a small follow-up patch for the German G2P PR at hexgrad#97.

It tightens a few normalization edge cases:

  • avoid consuming Uhr inside longer words like Uhrzeit
  • leave invalid times such as 25:00 Uhr and 23:99 Uhr unchanged instead of expanding each number around the colon
  • use Decimal for currency amounts so values like €9,999 round cleanly to zehn Euro rather than neun Euro und einhundert Cent

Verification:

  • PYTHONDONTWRITEBYTECODE=1 PYTHONPATH=. uv run --isolated --no-cache --no-project --python 3.11 --with pytest python -m pytest -q tests -p no:cacheprovider -> 67 passed, 4 skipped
  • PYTHONDONTWRITEBYTECODE=1 PYTHONPATH=. uv run --isolated --no-cache --no-project --python 3.11 --with pytest --with phonemizer-fork --with espeakng-loader python -m pytest -q tests/test_de.py -p no:cacheprovider -> 71 passed
  • python -m compileall misaki/de.py tests/test_de.py -> passed
  • git diff --check -> passed

I also noticed pyproject.toml adds the de extra, but uv.lock does not yet include it. I could not regenerate the lockfile cleanly because the existing he extra fails resolution for Python 3.12/3.13 via mishkal-hebrew>=0.3.2, so I left the lockfile untouched here.

Copy link
Copy Markdown
Owner

@semidark semidark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this change set looks correct and test coverage for the edge cases is strong. I left two non-blocking suggestions focused on simplifying invalid-time handling and avoiding potential placeholder-token collisions.

Comment thread misaki/de.py
Comment thread misaki/de.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants