Skip to content

Use QUuid::createUuidV5 for identity-repair / migration paths on save objects #522

@vpicaver

Description

@vpicaver

Background

CaveWhere stamps fresh UUIDs onto saved objects in two distinct situations:

  1. Construction-timecwShot, cwScrap, cwNote, cwNoteLiDAR, cwSketch, cwTrip, cwTeamMember, cwLead, cwFixStation, cwNoteStation, cwNoteLiDARStation, etc. all call QUuid::createUuid() in their constructors. This is correct: a user creating a new object should get an identity distinct from anyone else's.
  2. Identity-repair / migrationcwSaveLoad's repairedTopLevelId, regenerateNoteSubtreeIds, regenerateNoteLiDARSubtreeIds, regenerateSketchSubtreeIds, and regenerateTripSubtreeIds stamp UUIDs onto already-existing data when the load pipeline detects missing or duplicate IDs (e.g., loading a legacy project that was saved before UUIDs were added, or one that had a corrupted ID field).

The problem is in (2). Two collaborators independently loading the same legacy .cwproj (or the same as-yet-un-migrated entity tree) and saving each generate different random UUIDs for what is logically the same object. When their branches merge, every repaired object appears as a duplicate. Git can't tell them apart at the entity level, sync-merge handlers can't pair them, and the user is stuck deduplicating by hand.

Proposal

Replace QUuid::createUuid() with QUuid::createUuidV5(namespace, stableName) in the identity-repair / migration paths. Qt 6.8+ provides this directly; we're on 6.10+ so no new helper is needed. V5 UUIDs are SHA-1-derived deterministic UUIDs — given the same namespace and name, every caller produces the same UUID. Two collaborators repairing the same project converge to identical UUIDs, and the resulting commits are bitwise-equivalent.

Scope

Audit and update these call sites in cavewherelib/src/cwSaveLoad.cpp:

  • repairedTopLevelId (around line 615) — Cave, Trip, Note top-level repair.
  • regenerateNoteSubtreeIds (around line 632) — note + scraps + scrap stations + leads.
  • regenerateNoteLiDARSubtreeIds (around line 646) — lidar note + lidar stations.
  • regenerateSketchSubtreeIds (around line 654) — sketch + strokes.
  • regenerateTripSubtreeIds (around line 662) — trip + all child note subtrees.

The natural namespace is the project's own UUID (`projectMetadata.projectId`). The natural name is whatever string is stable for that entity within the project — e.g. cave name, trip name within cave, note's image-file relative path, scrap index within note, station name within scrap, etc. Each entity type needs a deliberate decision; this is where the audit work lives.

Out of scope

Constructor-time UUID generation in cwShot.cpp, cwScrap.cpp, cwNote.cpp, cwNoteLiDAR.cpp, cwSketch.cpp, cwFixStation.cpp, cwLead.cpp, cwTeamMember.cpp, cwNoteStation.h, cwNoteLiDARStation.h, and cwMappedQImage.cpp should stay random — those run when the user creates a wholly new object that doesn't correspond to anything elsewhere in the world.

`cwRemoteAccountModel.cpp` and cwSaveLoad's `d->projectMetadata.projectId = QUuid::createUuid()` in `newProject()` are also fine as random: account IDs and project IDs are intentionally globally unique per-machine creation events.

Risks / caveats

  • Legacy projects without a `projectId` — the proto comment notes "legacy projects loaded without a UUID field are left as-is to avoid spurious diffs." For these, there's no obvious namespace UUID. Options: (a) fall back to a fixed application-wide namespace constant; (b) require a projectId to be assigned (stamping a v5'd-on-something-deterministic value) before any subtree repair runs; (c) fall back to random for this edge case and document the conflict risk. Worth deciding before implementation.
  • Stable-name choice has to be irrevocable — once we ship v5 derivations, changing the input string later means previously-repaired projects get fresh random-looking UUIDs. The audit needs to pick names that won't change as the schema evolves.
  • Renames break the v5 chain — if cave name is the v5 input and the user renames the cave, future repair passes would generate a different UUID. Since repair runs only when the existing UUID is missing/duplicate (not on every load), this is rare in practice, but worth noting.

Related

Surfaced while planning the LAZ-sidecar refactor (`plans/LAZ_SIDECAR_PLAN.html`), which faces the same migration-collision problem at smaller scale and will use v5 on `(projectId, basename)`. That plan can serve as a small-scale prototype for the broader migration here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions