Migration runner sorts files lexically but gates on numeric id — 3-digit migrations silently skip earlier ones

Migration files are named `NN__description.yaml`, and the runner derives each migration id by parsing the numeric prefix (`strconv.Atoi(strings.Split(k, "__")[0])`). Applied state is tracked as a single numeric high-water mark, and a file runs only when its parsed id is strictly greater than that mark.

But the run order is decided by `sort.Strings(keys)` — i.e. **lexical** order over the filenames — while the apply gate is **numeric**. These two orderings agree only as long as every id has the same number of digits (e.g. all two-digit `00`–`99`).

## The bug

As soon as a project adds its first three-digit migration (`100__…`), lexical and numeric order diverge, because `"100__"` sorts *before* `"10__"`…`"99__"` (the character after `100` is `_` = 0x5F, which is greater than any digit `0`–`9` = 0x30–0x39, so shorter two-digit prefixes that share the `1` lose the comparison).

On a **fresh database** (high-water mark = 0) the runner therefore:

1. applies `01__`…`09__` (mark advances to 9),
2. then hits `100__`, `101__` next in lexical order and applies them — advancing the numeric mark to 101,
3. then reaches `10__`…`99__`, whose ids are all now **less than** the mark, so every one of them is **silently skipped**.

No error is raised — the runner just records 100/101 as applied and reports success. The result is a database missing every object created in migrations 10–99 (tables, columns, indexes), which then surfaces as confusing downstream runtime errors rather than a migration failure.

Databases migrated **incrementally** (mark already ≥ 99 before 100 was introduced) are unaffected, which makes this especially nasty: it only bites fresh environments — new developer setups, CI, ephemeral test databases — while existing/long-lived databases look fine. So it passes locally for whoever has been migrating all along and fails only for everyone starting clean.

## Reproduction (conceptual)

Given migration files `01__…` through `101__…` and an empty database:

- expected: ids applied in order 1, 2, …, 99, 100, 101
- actual: 1..9, then 100, 101, then 10..99 skipped (mark jumped to 101)

## Suggested fix

Order the keys by their **parsed numeric id** (the same value the apply gate uses), not by `sort.Strings`. A stable numeric sort with a lexical tie-break keeps ordering deterministic and makes the run order match the gate regardless of digit count, so zero-padding is no longer load-bearing.

Alternatively (or additionally), if lexical ordering is intended to remain authoritative, the requirement that all migration ids be **fixed-width / zero-padded** should be documented and ideally validated at startup (fail fast if a prefix width regresses), since the failure is otherwise silent.

I am happy to open a PR implementing the numeric-sort approach.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migration runner sorts files lexically but gates on numeric id — 3-digit migrations silently skip earlier ones #197

The bug

Reproduction (conceptual)

Suggested fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Migration runner sorts files lexically but gates on numeric id — 3-digit migrations silently skip earlier ones #197

Description

The bug

Reproduction (conceptual)

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions