ID type

# ID type

## Summary

Allow users to name the primary key column when defining a schema, instead of forcing it to be `"id"`. The ID column remains **required** and **auto-generated (INT64)**, but the name is user-defined.

## Motivation

Today every node schema has a hidden, hardcoded `"id"` column prepended by `prepend_id_field()`. Users cannot:
- Name the primary key to match their domain model (e.g. `person_id`, `order_id`, `pk`)
- Distinguish primary keys across schemas in joined results without relying on `schema.id` dot-notation
- Import external datasets that use a different column name for the identifier

The `"id"` string is baked into ~15 code locations across 10+ files via `field_names::kId`.

## Current Behavior

1. **Schema creation** (`SchemaRegistry::create` in `src/schema/schema.cpp`):
   Calls `prepend_id_field()` which unconditionally adds `arrow::field("id", arrow::int64())` as field 0.

2. **Node creation** (`NodeManager::create_node` in `include/core/node.hpp`):
   Writes the auto-generated ID into the field named `field_names::kId` (`"id"`).

3. **Query execution** (`src/query/execution.cpp`, `include/query/row.hpp`):
   Looks up `field_names::kId` by name to extract node IDs for join keys, row deduplication, and result ordering.

4. **Storage restore** (`src/storage/storage.cpp`):
   Matches `column_name == field_names::kId` to extract the node ID during shard reload.

5. **Shell** (`apps/tundra_shell.cpp`):
   Skips the `"id"` field during INSERT (auto-generated), displays it in results.

6. **Edge store** (`src/core/edge_store.cpp`):
   Edges have their own structural `"id"` column (separate concern, see non-goals).

## Proposed Design

### Schema-level ID metadata

Add an `id_field_name` attribute to `Schema`:

```cpp
struct Schema {
  // ... existing members ...
  std::string id_field_name_;  // e.g. "person_id", default "id"

  const std::string& id_field_name() const { return id_field_name_; }
  std::shared_ptr<Field> id_field() const { return get_field(id_field_name_); }
};
```

### Schema creation API

When creating a schema, the user specifies which field is the ID:

```
CREATE (Person { person_id: ID, name: STRING, age: INT32 })
```

The `ID` type marker tells the system:
- This field is the primary key
- It is `INT64`, auto-generated, non-nullable
- Exactly one field per schema must be marked `ID`

If no field is marked `ID`, the system can either:
- **(A)** Reject the schema with an error: "exactly one ID field required"
- **(B)** Auto-prepend a default `"id"` field (backward-compatible)

**Recommendation:** Option B for backward compatibility, with a deprecation warning encouraging explicit ID declaration.

### Internal changes

Replace all `field_names::kId` lookups on node schemas with `schema->id_field_name()`:

| Location | Current | Proposed |
|----------|---------|----------|
| `Schema::crete_arrow_schema` | Prepends hardcoded `"id"` | Validates exactly one `ID`-typed field exists |
| `NodeManager::create_node` | `schema->get_field("id")` | `schema->id_field()` |
| `prepend_id_field()` | Adds `arrow::field("id", int64)` | Remove; ID field is part of user schema |
| `Storage::read_shard` | `column_name == kId` | `column_name == schema->id_field_name()` |
| `Row::extract_schema_ids` | `field == kId` | `field == schema->id_field_name()` |
| Query execution (join keys, dedup) | `kId` | Resolve from schema at query-plan time |
| `tundra_shell.cpp` CREATE | Skips `"id"` field | Skips the schema's `id_field_name()` |
| Shard min_id / max_id tracking | Assumes `"id"` | Uses `schema->id_field_name()` |

### Persistence / metadata

`SchemaMetadata` (used for snapshot serialization) needs a new field:

```json
{
  "name": "Person",
  "id_field": "person_id",
  "fields": [ ... ]
}
```

Existing snapshots without `"id_field"` default to `"id"` for backward compatibility.

### Edge ID columns

Edge structural columns (`"id"`, `"source_id"`, `"target_id"`, `"created_ts"`) remain hardcoded. Edges are system-managed and not user-schema-defined — this is a separate concern. A future enhancement could allow custom edge property schemas, but that is out of scope here.

## Affected Files

| File | Change |
|------|--------|
| `include/schema/schema.hpp` | Add `id_field_name_`, `id_field()`, validation |
| `src/schema/schema.cpp` | Validate ID field on creation, remove `prepend_id_field` usage |
| `include/arrow/utils.hpp` | Deprecate/remove `prepend_id_field()` |
| `src/arrow/utils.cpp` | Remove `prepend_id_field()` |
| `include/core/node.hpp` | Use `schema->id_field()` instead of `kId` |
| `src/storage/storage.cpp` | Use `schema->id_field_name()` for ID column detection |
| `include/query/row.hpp` | Use schema-aware ID resolution |
| `src/query/execution.cpp` | Use schema-aware ID resolution in join/dedup logic |
| `apps/tundra_shell.cpp` | Use schema-aware ID skip in CREATE, display in results |
| `include/storage/metadata.hpp` | Add `id_field` to `SchemaMetadata` serialization |
| `include/common/constants.hpp` | `kId` becomes the default, not the only option |
| `src/main/database.cpp` | Pass schema context where ID field name is needed |


## Testing

- Unit test: create schema with custom ID name, insert nodes, verify ID column name in results
- Unit test: create schema without explicit ID, verify `"id"` is auto-prepended (backward compat)
- Unit test: reject schema with two `ID` fields
- Unit test: reject schema with zero `ID` fields (if option A chosen)
- Snapshot round-trip: save and restore a schema with custom ID, verify field names survive
- Join test: join two schemas with different ID column names
- Shell test: CREATE/MATCH with custom ID field name

## Non-Goals

- Custom ID types (e.g. UUID, STRING primary keys) — INT64 auto-increment only for now
- Custom edge ID column names — edges remain system-managed
- Composite primary keys — single-column only


Location	Current	Proposed
`Schema::crete_arrow_schema`	Prepends hardcoded `"id"`	Validates exactly one `ID`-typed field exists
`NodeManager::create_node`	`schema->get_field("id")`	`schema->id_field()`
`prepend_id_field()`	Adds `arrow::field("id", int64)`	Remove; ID field is part of user schema
`Storage::read_shard`	`column_name == kId`	`column_name == schema->id_field_name()`
`Row::extract_schema_ids`	`field == kId`	`field == schema->id_field_name()`
Query execution (join keys, dedup)	`kId`	Resolve from schema at query-plan time
`tundra_shell.cpp` CREATE	Skips `"id"` field	Skips the schema's `id_field_name()`
Shard min_id / max_id tracking	Assumes `"id"`	Uses `schema->id_field_name()`

File	Change
`include/schema/schema.hpp`	Add `id_field_name_`, `id_field()`, validation
`src/schema/schema.cpp`	Validate ID field on creation, remove `prepend_id_field` usage
`include/arrow/utils.hpp`	Deprecate/remove `prepend_id_field()`
`src/arrow/utils.cpp`	Remove `prepend_id_field()`
`include/core/node.hpp`	Use `schema->id_field()` instead of `kId`
`src/storage/storage.cpp`	Use `schema->id_field_name()` for ID column detection
`include/query/row.hpp`	Use schema-aware ID resolution
`src/query/execution.cpp`	Use schema-aware ID resolution in join/dedup logic
`apps/tundra_shell.cpp`	Use schema-aware ID skip in CREATE, display in results
`include/storage/metadata.hpp`	Add `id_field` to `SchemaMetadata` serialization
`include/common/constants.hpp`	`kId` becomes the default, not the only option
`src/main/database.cpp`	Pass schema context where ID field name is needed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ID type #34

ID type

Summary

Motivation

Current Behavior

Proposed Design

Schema-level ID metadata

Schema creation API

Internal changes

Persistence / metadata

Edge ID columns

Affected Files

Testing

Non-Goals

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

ID type #34

Description

ID type

Summary

Motivation

Current Behavior

Proposed Design

Schema-level ID metadata

Schema creation API

Internal changes

Persistence / metadata

Edge ID columns

Affected Files

Testing

Non-Goals

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions