Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,38 @@ Why:
- behaviour stays explainable, reproducible, and testable
- correction risk is lower when every decision is rule-backed

## Shifted Number-Row Digit Support

`cikmov` supports deterministic correction when shifted number-row symbols are typed instead of digits.

Supported substitutions:

```text
! -> 1
@ -> 2
" -> 2
# -> 3
£ -> 3
$ -> 4
% -> 5
^ -> 6
& -> 7
* -> 8
( -> 9
) -> 0
```

Scope rules:

- mapping is the union of UK + US number-row shifted symbols (including Irish usage of UK layout)
- no keyboard-layout detection is performed at runtime
- substitutions are attempted only where grammar requires digits:
- outward digit positions
- district digit positions
- inward first character
- substitutions are not attempted in letter-only positions
- when stripping shifted symbols produces an already-valid compact postcode, symbols are treated as noise and not as digit substitutions

## Public API

```php
Expand Down Expand Up @@ -159,6 +191,10 @@ Scoring policy:
- this reflects higher structural significance of outward geography encoding
- ambiguity lowers confidence further
- alternatives are capped at 5 entries for bounded output size
- shifted number-row symbol penalties:
- inward digit substitution: `-8`
- outward non-area digit substitution: `-14`
- outward area digit substitution: `-22` (reserved for completeness; current grammar does not place digits in outward area-letter slots)

Ambiguity application policy:

Expand Down Expand Up @@ -252,6 +288,24 @@ $result = Cikmov::analyse('EC1A 1AI');
// no correction is applied
```

### 6) Shifted-digit correction

```php
$result = Cikmov::analyse('EC1A !AL');
// bestCandidate: "EC1A 1AL"
// confidence: 92
// appliedPostcode: "EC1A 1AL"
```

### 7) Shifted symbol in letter position is rejected

```php
$result = Cikmov::analyse('EC1A 1A!');
// invalid: no shifted-digit substitution in letter-only positions
// bestCandidate: null
// appliedPostcode: null
```

## Embedded Postcode Areas

The full area set is embedded and enforced:
Expand Down
78 changes: 69 additions & 9 deletions src/Internal/Analyser.php
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,9 @@ final class Analyser
{
private const OUTWARD_SUBSTITUTION_BASE_PENALTY = 8;
private const INWARD_SUBSTITUTION_BASE_PENALTY = 4;
private const OUTWARD_SHIFTED_DIGIT_AREA_PENALTY = 22;
private const OUTWARD_SHIFTED_DIGIT_PENALTY = 14;
private const INWARD_SHIFTED_DIGIT_PENALTY = 8;
private const TIE_AMBIGUITY_PENALTY = 15;
private const NEAR_AMBIGUITY_PENALTY = 6;
private const ALTERNATIVE_SCORE_WINDOW = 4;
Expand Down Expand Up @@ -80,7 +83,22 @@ public static function analyse(string $input, int $minConfidenceToApply): Result
);
}

if (!preg_match('/[A-Z]/', $compact) || !preg_match('/[0-9]/', $compact)) {
$compactWithoutShiftedSymbols = PostcodeRules::stripShiftedDigitSymbols($compact);
if ($compactWithoutShiftedSymbols !== $compact && PostcodeRules::isValidCompact($compactWithoutShiftedSymbols)) {
$canonical = PostcodeRules::formatCompact($compactWithoutShiftedSymbols);

return new Result(
input: $input,
normalizedInput: $canonical,
inputWasValid: true,
bestCandidate: $canonical,
confidence: 100,
appliedPostcode: $canonical,
alternatives: []
);
}

if (!preg_match('/[A-Z]/', $compact) || !PostcodeRules::containsDigitLikeCharacter($compact)) {
return new Result(
input: $input,
normalizedInput: $normalizedInput,
Expand Down Expand Up @@ -206,11 +224,17 @@ private static function generateCandidates(string $compact): array
continue;
}

$areaLength = str_starts_with($pattern, 'AA') ? 2 : 1;
$optionsByPosition = [];
$isPatternViable = true;

foreach ($outwardTokens as $position => $token) {
$options = self::optionsForCharacter($outwardInput[$position], $token, true);
$options = self::optionsForCharacter(
character: $outwardInput[$position],
expectedToken: $token,
outward: true,
isOutwardAreaPosition: $position < $areaLength
);
if ($options === []) {
$isPatternViable = false;
break;
Expand Down Expand Up @@ -273,12 +297,23 @@ private static function isClassCompatibleOutward(string $outward, string $patter
return false;
}

if ($token === 'D' && !ctype_digit($character)) {
return false;
}
if ($token !== 'L') {
if (ctype_digit($character)) {
if ($token === 'N' && $character === '0') {
return false;
}

if ($token === 'N' && (!ctype_digit($character) || $character === '0')) {
return false;
continue;
}

$shiftedDigit = PostcodeRules::shiftedDigitReplacement($character);
if ($shiftedDigit === null) {
return false;
}

if ($token === 'N' && $shiftedDigit === '0') {
return false;
}
}
}

Expand Down Expand Up @@ -315,8 +350,12 @@ private static function walkCandidateOptions(
/**
* @return list<array{char:string,penalty:int}>
*/
private static function optionsForCharacter(string $character, string $expectedToken, bool $outward): array
{
private static function optionsForCharacter(
string $character,
string $expectedToken,
bool $outward,
bool $isOutwardAreaPosition = false
): array {
$basePenalty = $outward ? self::OUTWARD_SUBSTITUTION_BASE_PENALTY : self::INWARD_SUBSTITUTION_BASE_PENALTY;
$options = [];

Expand Down Expand Up @@ -346,6 +385,14 @@ private static function optionsForCharacter(string $character, string $expectedT
$options[] = ['char' => $replacement, 'penalty' => $basePenalty + $extraPenalty];
}
}

$shiftedDigit = PostcodeRules::shiftedDigitReplacement($character);
if ($shiftedDigit !== null && ($expectedToken !== 'N' || $shiftedDigit !== '0')) {
$options[] = [
'char' => $shiftedDigit,
'penalty' => self::shiftedDigitPenalty($outward, $isOutwardAreaPosition),
];
}
}

$deduplicated = [];
Expand All @@ -371,4 +418,17 @@ private static function optionsForCharacter(string $character, string $expectedT

return $finalOptions;
}

private static function shiftedDigitPenalty(bool $outward, bool $isOutwardAreaPosition): int
{
if (!$outward) {
return self::INWARD_SHIFTED_DIGIT_PENALTY;
}

if ($isOutwardAreaPosition) {
return self::OUTWARD_SHIFTED_DIGIT_AREA_PENALTY;
}

return self::OUTWARD_SHIFTED_DIGIT_PENALTY;
}
}
48 changes: 46 additions & 2 deletions src/Internal/PostcodeRules.php
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,28 @@ final class PostcodeRules
private const FORBIDDEN_FIRST_OUTWARD_LETTERS = 'QVX';
private const FORBIDDEN_SECOND_OUTWARD_LETTERS = 'IJZ';
private const AA9A_ALLOWED_FINAL_LETTERS = 'ABEHMNPRVWXY';
private const SHIFTED_DIGIT_SYMBOLS = '!@"#$%^&*()';
private const SHIFTED_DIGIT_ALIASES = [
"\u{00A3}" => '#',
];

/**
* @var array<string, string>
*/
private const SHIFTED_DIGIT_TO_DIGIT = [
'!' => '1',
'@' => '2',
'"' => '2',
'#' => '3',
"\u{00A3}" => '3',
'$' => '4',
'%' => '5',
'^' => '6',
'&' => '7',
'*' => '8',
'(' => '9',
')' => '0',
];

/**
* @var array<string, list<string>>
Expand Down Expand Up @@ -172,9 +194,31 @@ final class PostcodeRules
public static function compactFromInput(string $input): string
{
$normalized = strtoupper($input);
$compact = preg_replace('/[^A-Z0-9]+/', '', $normalized);
$normalized = strtr($normalized, self::SHIFTED_DIGIT_ALIASES);
$compact = preg_replace('/[^A-Z0-9!@"#$%\^&*()]+/', '', $normalized);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Strip shifted symbols when they are just surrounding noise

Retaining !@"#$%^&*() in compactFromInput causes previously valid inputs with trailing/leading punctuation to be rejected. For example, EC1A 1AL! now compacts to EC1A1AL! (length 8), so analysis returns no candidate, whereas the same input was normalized to EC1A1AL before this change. Because these characters are common as sentence punctuation or wrappers, this introduces a real regression in format acceptance for otherwise valid postcodes.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good catch. Addressed in 4dba702 by trimming shifted symbols only at compact-string boundaries so surrounding punctuation/wrappers are ignored again (EC1A 1AL!, !EC1A 1AL, (EC1A 1AL)), while preserving in-position shifted symbols for deterministic digit correction. Added regression tests in tests/CikmovTest.php and tests/PostcodeRulesTest.php. Full suite passes: composer test -> OK (140 tests, 1165 assertions).

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Strip extra shifted symbols before candidate generation

Keeping shifted symbols in compactFromInput lets an inserted symbol be interpreted as an additional digit instead of noise. For example, M!1 1AE compacts to M!11AE and is auto-corrected/applied as M11 1AE (confidence 86) rather than normalizing back to M1 1AE; the same behavior occurs for other short outward patterns when a stray !@"#$%^&*() is inserted. This turns incidental punctuation into a different applied postcode, which is a regression from the prior normalization behavior and goes beyond symbol-for-digit replacement.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 7332274. I added a deterministic safety step in Analyser: after compaction, if stripping shifted symbols (!@"#$%^&*()) yields an already-valid compact postcode, we treat those symbols as inserted noise and accept that canonical value (confidence 100) instead of turning punctuation into a different applied correction. This fixes cases like M!1 1AE -> M1 1AE, SW@1A 1AA -> SW1A 1AA, and EC1A !1AL -> EC1A 1AL. Shifted-symbol substitutions still apply when stripping does not produce a valid compact postcode. Added regression tests in tests/CikmovTest.php and tests/PostcodeRulesTest.php, and docs sync in README.md. Full suite passes: composer test => OK (143 tests, 1175 assertions).

if ($compact === null || $compact === '') {
return '';
}

return $compact ?? '';
// Shifted symbols can validly stand in for digits, but never at postcode boundaries.
$compact = trim($compact, self::SHIFTED_DIGIT_SYMBOLS);

return $compact;
}

public static function containsDigitLikeCharacter(string $compact): bool
{
return strpbrk($compact, '0123456789' . self::SHIFTED_DIGIT_SYMBOLS) !== false;
}

public static function shiftedDigitReplacement(string $character): ?string
{
return self::SHIFTED_DIGIT_TO_DIGIT[$character] ?? null;
}

public static function stripShiftedDigitSymbols(string $compact): string
{
return str_replace(str_split(self::SHIFTED_DIGIT_SYMBOLS), '', $compact);
}

public static function displayFromCompact(string $compact): string
Expand Down
Loading