Background
CaveWhere stamps fresh UUIDs onto saved objects in two distinct situations:
- Construction-time —
cwShot, cwScrap, cwNote, cwNoteLiDAR, cwSketch, cwTrip, cwTeamMember, cwLead, cwFixStation, cwNoteStation, cwNoteLiDARStation, etc. all call QUuid::createUuid() in their constructors. This is correct: a user creating a new object should get an identity distinct from anyone else's.
- Identity-repair / migration —
cwSaveLoad's repairedTopLevelId, regenerateNoteSubtreeIds, regenerateNoteLiDARSubtreeIds, regenerateSketchSubtreeIds, and regenerateTripSubtreeIds stamp UUIDs onto already-existing data when the load pipeline detects missing or duplicate IDs (e.g., loading a legacy project that was saved before UUIDs were added, or one that had a corrupted ID field).
The problem is in (2). Two collaborators independently loading the same legacy .cwproj (or the same as-yet-un-migrated entity tree) and saving each generate different random UUIDs for what is logically the same object. When their branches merge, every repaired object appears as a duplicate. Git can't tell them apart at the entity level, sync-merge handlers can't pair them, and the user is stuck deduplicating by hand.
Proposal
Replace QUuid::createUuid() with QUuid::createUuidV5(namespace, stableName) in the identity-repair / migration paths. Qt 6.8+ provides this directly; we're on 6.10+ so no new helper is needed. V5 UUIDs are SHA-1-derived deterministic UUIDs — given the same namespace and name, every caller produces the same UUID. Two collaborators repairing the same project converge to identical UUIDs, and the resulting commits are bitwise-equivalent.
Scope
Audit and update these call sites in cavewherelib/src/cwSaveLoad.cpp:
repairedTopLevelId (around line 615) — Cave, Trip, Note top-level repair.
regenerateNoteSubtreeIds (around line 632) — note + scraps + scrap stations + leads.
regenerateNoteLiDARSubtreeIds (around line 646) — lidar note + lidar stations.
regenerateSketchSubtreeIds (around line 654) — sketch + strokes.
regenerateTripSubtreeIds (around line 662) — trip + all child note subtrees.
The natural namespace is the project's own UUID (`projectMetadata.projectId`). The natural name is whatever string is stable for that entity within the project — e.g. cave name, trip name within cave, note's image-file relative path, scrap index within note, station name within scrap, etc. Each entity type needs a deliberate decision; this is where the audit work lives.
Out of scope
Constructor-time UUID generation in cwShot.cpp, cwScrap.cpp, cwNote.cpp, cwNoteLiDAR.cpp, cwSketch.cpp, cwFixStation.cpp, cwLead.cpp, cwTeamMember.cpp, cwNoteStation.h, cwNoteLiDARStation.h, and cwMappedQImage.cpp should stay random — those run when the user creates a wholly new object that doesn't correspond to anything elsewhere in the world.
`cwRemoteAccountModel.cpp` and cwSaveLoad's `d->projectMetadata.projectId = QUuid::createUuid()` in `newProject()` are also fine as random: account IDs and project IDs are intentionally globally unique per-machine creation events.
Risks / caveats
- Legacy projects without a `projectId` — the proto comment notes "legacy projects loaded without a UUID field are left as-is to avoid spurious diffs." For these, there's no obvious namespace UUID. Options: (a) fall back to a fixed application-wide namespace constant; (b) require a projectId to be assigned (stamping a v5'd-on-something-deterministic value) before any subtree repair runs; (c) fall back to random for this edge case and document the conflict risk. Worth deciding before implementation.
- Stable-name choice has to be irrevocable — once we ship v5 derivations, changing the input string later means previously-repaired projects get fresh random-looking UUIDs. The audit needs to pick names that won't change as the schema evolves.
- Renames break the v5 chain — if cave name is the v5 input and the user renames the cave, future repair passes would generate a different UUID. Since repair runs only when the existing UUID is missing/duplicate (not on every load), this is rare in practice, but worth noting.
Related
Surfaced while planning the LAZ-sidecar refactor (`plans/LAZ_SIDECAR_PLAN.html`), which faces the same migration-collision problem at smaller scale and will use v5 on `(projectId, basename)`. That plan can serve as a small-scale prototype for the broader migration here.
Background
CaveWhere stamps fresh UUIDs onto saved objects in two distinct situations:
cwShot,cwScrap,cwNote,cwNoteLiDAR,cwSketch,cwTrip,cwTeamMember,cwLead,cwFixStation,cwNoteStation,cwNoteLiDARStation, etc. all callQUuid::createUuid()in their constructors. This is correct: a user creating a new object should get an identity distinct from anyone else's.cwSaveLoad'srepairedTopLevelId,regenerateNoteSubtreeIds,regenerateNoteLiDARSubtreeIds,regenerateSketchSubtreeIds, andregenerateTripSubtreeIdsstamp UUIDs onto already-existing data when the load pipeline detects missing or duplicate IDs (e.g., loading a legacy project that was saved before UUIDs were added, or one that had a corrupted ID field).The problem is in (2). Two collaborators independently loading the same legacy
.cwproj(or the same as-yet-un-migrated entity tree) and saving each generate different random UUIDs for what is logically the same object. When their branches merge, every repaired object appears as a duplicate. Git can't tell them apart at the entity level, sync-merge handlers can't pair them, and the user is stuck deduplicating by hand.Proposal
Replace
QUuid::createUuid()withQUuid::createUuidV5(namespace, stableName)in the identity-repair / migration paths. Qt 6.8+ provides this directly; we're on 6.10+ so no new helper is needed. V5 UUIDs are SHA-1-derived deterministic UUIDs — given the same namespace and name, every caller produces the same UUID. Two collaborators repairing the same project converge to identical UUIDs, and the resulting commits are bitwise-equivalent.Scope
Audit and update these call sites in
cavewherelib/src/cwSaveLoad.cpp:repairedTopLevelId(around line 615) — Cave, Trip, Note top-level repair.regenerateNoteSubtreeIds(around line 632) — note + scraps + scrap stations + leads.regenerateNoteLiDARSubtreeIds(around line 646) — lidar note + lidar stations.regenerateSketchSubtreeIds(around line 654) — sketch + strokes.regenerateTripSubtreeIds(around line 662) — trip + all child note subtrees.The natural namespace is the project's own UUID (`projectMetadata.projectId`). The natural name is whatever string is stable for that entity within the project — e.g. cave name, trip name within cave, note's image-file relative path, scrap index within note, station name within scrap, etc. Each entity type needs a deliberate decision; this is where the audit work lives.
Out of scope
Constructor-time UUID generation in
cwShot.cpp,cwScrap.cpp,cwNote.cpp,cwNoteLiDAR.cpp,cwSketch.cpp,cwFixStation.cpp,cwLead.cpp,cwTeamMember.cpp,cwNoteStation.h,cwNoteLiDARStation.h, andcwMappedQImage.cppshould stay random — those run when the user creates a wholly new object that doesn't correspond to anything elsewhere in the world.`cwRemoteAccountModel.cpp` and
cwSaveLoad's `d->projectMetadata.projectId = QUuid::createUuid()` in `newProject()` are also fine as random: account IDs and project IDs are intentionally globally unique per-machine creation events.Risks / caveats
Related
Surfaced while planning the LAZ-sidecar refactor (`plans/LAZ_SIDECAR_PLAN.html`), which faces the same migration-collision problem at smaller scale and will use v5 on `(projectId, basename)`. That plan can serve as a small-scale prototype for the broader migration here.