Skip to content

feat: add shifted number-row digit correction support#1

Merged
jamiethompson merged 5 commits into
mainfrom
develop/shifted-digit-correction-support
Feb 22, 2026
Merged

feat: add shifted number-row digit correction support#1
jamiethompson merged 5 commits into
mainfrom
develop/shifted-digit-correction-support

Conversation

@jamiethompson
Copy link
Copy Markdown
Owner

Summary

This PR adds deterministic correction support for shifted number-row characters when users type symbols instead of digits in UK postcode inputs.

It implements the requested mapping and constraints while keeping cikmov strict and rule-backed:

  • ! -> 1
  • @ -> 2, " -> 2
  • # -> 3, £ -> 3
  • $ -> 4, % -> 5, ^ -> 6, & -> 7, * -> 8, ( -> 9, ) -> 0

What Changed

1) Test-first feature spec coverage

  • Added behavioural tests for:
    • full shifted mapping in inward digit positions
    • outward shifted-digit correction cases
    • positional safety (no substitution in letter-only slots)
    • zero-rejection for N positions () -> 0 should not bypass grammar)
    • combined shifted + existing confusion correction scoring
    • idempotency with shifted inputs
  • Added normalization tests to ensure shifted symbols are preserved through compaction.

2) Runtime implementation

  • Extended PostcodeRules::compactFromInput() to preserve shifted symbols used by the mapping.
  • Added deterministic pound alias normalization (£ normalized to #) so processing remains single-byte-safe and fully deterministic.
  • Added PostcodeRules::containsDigitLikeCharacter() and PostcodeRules::shiftedDigitReplacement() to centralize shifted-digit handling.
  • Extended Analyser candidate generation to map shifted symbols only in digit-required positions.
  • Updated class-compatibility filtering so shifted symbols count as digit-like only when valid for the expected token.

3) Scoring updates

Added explicit shifted-digit penalties:

  • inward shifted digit: -8
  • outward non-area shifted digit: -14
  • outward area shifted digit: -22

These penalties are deterministic and cumulative, and confidence remains clamped to 0..100.

Why This Design

  • Preserves strict grammar behavior: no relaxed letter-slot substitution for shifted symbols.
  • Keeps algorithm deterministic and bounded.
  • Uses explicit mapping only (no layout detection, no heuristics).
  • Maintains existing Result invariants and threshold semantics.

Edge Cases Covered

  • Symbol in inward letter position rejects (EC1A 1A!).
  • Shifted zero in required non-zero outward N position rejects (SW)A 1AA).
  • Combined corrections remain threshold-gated (EC!A 1A1 best candidate present, application withheld at default threshold).
  • Pound symbol support works without multibyte indexing pitfalls.

Documentation

Updated README with:

  • full shifted mapping table
  • layout-union scope rationale
  • positional constraints
  • shifted scoring implications
  • feature examples (valid correction + explicit rejection case)

Test Evidence

Command:

composer test

Result:

  • OK (135 tests, 1151 assertions)

Commit Breakdown

  1. test: specify shifted number-row correction behavior
  2. feat: support shifted number-row digit substitutions
  3. docs: document shifted-digit mapping and scoring

Add coverage for US/UK shifted-digit symbols in inward and outward digit slots, including confidence expectations and positional safety constraints.

Also adds compact-input normalization tests for retained shifted symbols and pound-sign alias handling. These tests intentionally fail on current implementation and define the feature contract.
Preserve supported shifted number-row symbols during input compaction and normalize pound-sign aliases for deterministic single-byte processing.

Extend candidate generation to map shifted symbols to digits only in digit-required positions, enforce non-zero N constraints, and apply deterministic shifted penalties (inward -8, outward -14, outward-area -22).
Add the full UK/US number-row shifted-symbol mapping table, scope boundaries, and positional rules.

Document shifted-digit penalty values and add examples for successful correction and letter-position rejection.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: af55d1c598

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

$normalized = strtoupper($input);
$compact = preg_replace('/[^A-Z0-9]+/', '', $normalized);
$normalized = strtr($normalized, self::SHIFTED_DIGIT_ALIASES);
$compact = preg_replace('/[^A-Z0-9!@"#$%\^&*()]+/', '', $normalized);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Strip shifted symbols when they are just surrounding noise

Retaining !@"#$%^&*() in compactFromInput causes previously valid inputs with trailing/leading punctuation to be rejected. For example, EC1A 1AL! now compacts to EC1A1AL! (length 8), so analysis returns no candidate, whereas the same input was normalized to EC1A1AL before this change. Because these characters are common as sentence punctuation or wrappers, this introduces a real regression in format acceptance for otherwise valid postcodes.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch. Addressed in 4dba702 by trimming shifted symbols only at compact-string boundaries so surrounding punctuation/wrappers are ignored again (EC1A 1AL!, !EC1A 1AL, (EC1A 1AL)), while preserving in-position shifted symbols for deterministic digit correction. Added regression tests in tests/CikmovTest.php and tests/PostcodeRulesTest.php. Full suite passes: composer test -> OK (140 tests, 1165 assertions).

Trim shifted number-row symbols at compact string boundaries so wrappers and trailing punctuation do not block otherwise valid analysis.

Preserve in-position shifted symbols for deterministic digit correction and add regression tests for leading/trailing wrappers plus wrapped shifted-correction paths.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4dba702404

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

$normalized = strtoupper($input);
$compact = preg_replace('/[^A-Z0-9]+/', '', $normalized);
$normalized = strtr($normalized, self::SHIFTED_DIGIT_ALIASES);
$compact = preg_replace('/[^A-Z0-9!@"#$%\^&*()]+/', '', $normalized);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Strip extra shifted symbols before candidate generation

Keeping shifted symbols in compactFromInput lets an inserted symbol be interpreted as an additional digit instead of noise. For example, M!1 1AE compacts to M!11AE and is auto-corrected/applied as M11 1AE (confidence 86) rather than normalizing back to M1 1AE; the same behavior occurs for other short outward patterns when a stray !@"#$%^&*() is inserted. This turns incidental punctuation into a different applied postcode, which is a regression from the prior normalization behavior and goes beyond symbol-for-digit replacement.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 7332274. I added a deterministic safety step in Analyser: after compaction, if stripping shifted symbols (!@"#$%^&*()) yields an already-valid compact postcode, we treat those symbols as inserted noise and accept that canonical value (confidence 100) instead of turning punctuation into a different applied correction. This fixes cases like M!1 1AE -> M1 1AE, SW@1A 1AA -> SW1A 1AA, and EC1A !1AL -> EC1A 1AL. Shifted-symbol substitutions still apply when stripping does not produce a valid compact postcode. Added regression tests in tests/CikmovTest.php and tests/PostcodeRulesTest.php, and docs sync in README.md. Full suite passes: composer test => OK (143 tests, 1175 assertions).

Prefer a stripped-symbol compact form when removing shifted symbols yields an already valid postcode, preventing stray punctuation from being applied as a different corrected postcode.

Adds targeted regression coverage for inserted-symbol noise and documents the behavior; keeps shifted substitution paths for true digit-position replacements where stripping does not yield a valid compact postcode.
Repository owner deleted a comment from chatgpt-codex-connector Bot Feb 22, 2026
@jamiethompson jamiethompson merged commit 98357dd into main Feb 22, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant