feat: add shifted number-row digit correction support#1
Conversation
Add coverage for US/UK shifted-digit symbols in inward and outward digit slots, including confidence expectations and positional safety constraints. Also adds compact-input normalization tests for retained shifted symbols and pound-sign alias handling. These tests intentionally fail on current implementation and define the feature contract.
Preserve supported shifted number-row symbols during input compaction and normalize pound-sign aliases for deterministic single-byte processing. Extend candidate generation to map shifted symbols to digits only in digit-required positions, enforce non-zero N constraints, and apply deterministic shifted penalties (inward -8, outward -14, outward-area -22).
Add the full UK/US number-row shifted-symbol mapping table, scope boundaries, and positional rules. Document shifted-digit penalty values and add examples for successful correction and letter-position rejection.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: af55d1c598
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| $normalized = strtoupper($input); | ||
| $compact = preg_replace('/[^A-Z0-9]+/', '', $normalized); | ||
| $normalized = strtr($normalized, self::SHIFTED_DIGIT_ALIASES); | ||
| $compact = preg_replace('/[^A-Z0-9!@"#$%\^&*()]+/', '', $normalized); |
There was a problem hiding this comment.
Strip shifted symbols when they are just surrounding noise
Retaining !@"#$%^&*() in compactFromInput causes previously valid inputs with trailing/leading punctuation to be rejected. For example, EC1A 1AL! now compacts to EC1A1AL! (length 8), so analysis returns no candidate, whereas the same input was normalized to EC1A1AL before this change. Because these characters are common as sentence punctuation or wrappers, this introduces a real regression in format acceptance for otherwise valid postcodes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Thanks, good catch. Addressed in 4dba702 by trimming shifted symbols only at compact-string boundaries so surrounding punctuation/wrappers are ignored again (EC1A 1AL!, !EC1A 1AL, (EC1A 1AL)), while preserving in-position shifted symbols for deterministic digit correction. Added regression tests in tests/CikmovTest.php and tests/PostcodeRulesTest.php. Full suite passes: composer test -> OK (140 tests, 1165 assertions).
Trim shifted number-row symbols at compact string boundaries so wrappers and trailing punctuation do not block otherwise valid analysis. Preserve in-position shifted symbols for deterministic digit correction and add regression tests for leading/trailing wrappers plus wrapped shifted-correction paths.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 4dba702404
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| $normalized = strtoupper($input); | ||
| $compact = preg_replace('/[^A-Z0-9]+/', '', $normalized); | ||
| $normalized = strtr($normalized, self::SHIFTED_DIGIT_ALIASES); | ||
| $compact = preg_replace('/[^A-Z0-9!@"#$%\^&*()]+/', '', $normalized); |
There was a problem hiding this comment.
Strip extra shifted symbols before candidate generation
Keeping shifted symbols in compactFromInput lets an inserted symbol be interpreted as an additional digit instead of noise. For example, M!1 1AE compacts to M!11AE and is auto-corrected/applied as M11 1AE (confidence 86) rather than normalizing back to M1 1AE; the same behavior occurs for other short outward patterns when a stray !@"#$%^&*() is inserted. This turns incidental punctuation into a different applied postcode, which is a regression from the prior normalization behavior and goes beyond symbol-for-digit replacement.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Addressed in 7332274. I added a deterministic safety step in Analyser: after compaction, if stripping shifted symbols (!@"#$%^&*()) yields an already-valid compact postcode, we treat those symbols as inserted noise and accept that canonical value (confidence 100) instead of turning punctuation into a different applied correction. This fixes cases like M!1 1AE -> M1 1AE, SW@1A 1AA -> SW1A 1AA, and EC1A !1AL -> EC1A 1AL. Shifted-symbol substitutions still apply when stripping does not produce a valid compact postcode. Added regression tests in tests/CikmovTest.php and tests/PostcodeRulesTest.php, and docs sync in README.md. Full suite passes: composer test => OK (143 tests, 1175 assertions).
Prefer a stripped-symbol compact form when removing shifted symbols yields an already valid postcode, preventing stray punctuation from being applied as a different corrected postcode. Adds targeted regression coverage for inserted-symbol noise and documents the behavior; keeps shifted substitution paths for true digit-position replacements where stripping does not yield a valid compact postcode.
Summary
This PR adds deterministic correction support for shifted number-row characters when users type symbols instead of digits in UK postcode inputs.
It implements the requested mapping and constraints while keeping cikmov strict and rule-backed:
! -> 1@ -> 2," -> 2# -> 3,£ -> 3$ -> 4,% -> 5,^ -> 6,& -> 7,* -> 8,( -> 9,) -> 0What Changed
1) Test-first feature spec coverage
Npositions () -> 0should not bypass grammar)2) Runtime implementation
PostcodeRules::compactFromInput()to preserve shifted symbols used by the mapping.£normalized to#) so processing remains single-byte-safe and fully deterministic.PostcodeRules::containsDigitLikeCharacter()andPostcodeRules::shiftedDigitReplacement()to centralize shifted-digit handling.Analysercandidate generation to map shifted symbols only in digit-required positions.3) Scoring updates
Added explicit shifted-digit penalties:
-8-14-22These penalties are deterministic and cumulative, and confidence remains clamped to
0..100.Why This Design
Edge Cases Covered
EC1A 1A!).Nposition rejects (SW)A 1AA).EC!A 1A1best candidate present, application withheld at default threshold).Documentation
Updated README with:
Test Evidence
Command:
composer testResult:
OK (135 tests, 1151 assertions)Commit Breakdown
test: specify shifted number-row correction behaviorfeat: support shifted number-row digit substitutionsdocs: document shifted-digit mapping and scoring