ID type
Summary
Allow users to name the primary key column when defining a schema, instead of forcing it to be "id". The ID column remains required and auto-generated (INT64), but the name is user-defined.
Motivation
Today every node schema has a hidden, hardcoded "id" column prepended by prepend_id_field(). Users cannot:
- Name the primary key to match their domain model (e.g.
person_id, order_id, pk)
- Distinguish primary keys across schemas in joined results without relying on
schema.id dot-notation
- Import external datasets that use a different column name for the identifier
The "id" string is baked into ~15 code locations across 10+ files via field_names::kId.
Current Behavior
-
Schema creation (SchemaRegistry::create in src/schema/schema.cpp):
Calls prepend_id_field() which unconditionally adds arrow::field("id", arrow::int64()) as field 0.
-
Node creation (NodeManager::create_node in include/core/node.hpp):
Writes the auto-generated ID into the field named field_names::kId ("id").
-
Query execution (src/query/execution.cpp, include/query/row.hpp):
Looks up field_names::kId by name to extract node IDs for join keys, row deduplication, and result ordering.
-
Storage restore (src/storage/storage.cpp):
Matches column_name == field_names::kId to extract the node ID during shard reload.
-
Shell (apps/tundra_shell.cpp):
Skips the "id" field during INSERT (auto-generated), displays it in results.
-
Edge store (src/core/edge_store.cpp):
Edges have their own structural "id" column (separate concern, see non-goals).
Proposed Design
Schema-level ID metadata
Add an id_field_name attribute to Schema:
struct Schema {
// ... existing members ...
std::string id_field_name_; // e.g. "person_id", default "id"
const std::string& id_field_name() const { return id_field_name_; }
std::shared_ptr<Field> id_field() const { return get_field(id_field_name_); }
};
Schema creation API
When creating a schema, the user specifies which field is the ID:
CREATE (Person { person_id: ID, name: STRING, age: INT32 })
The ID type marker tells the system:
- This field is the primary key
- It is
INT64, auto-generated, non-nullable
- Exactly one field per schema must be marked
ID
If no field is marked ID, the system can either:
- (A) Reject the schema with an error: "exactly one ID field required"
- (B) Auto-prepend a default
"id" field (backward-compatible)
Recommendation: Option B for backward compatibility, with a deprecation warning encouraging explicit ID declaration.
Internal changes
Replace all field_names::kId lookups on node schemas with schema->id_field_name():
| Location |
Current |
Proposed |
Schema::crete_arrow_schema |
Prepends hardcoded "id" |
Validates exactly one ID-typed field exists |
NodeManager::create_node |
schema->get_field("id") |
schema->id_field() |
prepend_id_field() |
Adds arrow::field("id", int64) |
Remove; ID field is part of user schema |
Storage::read_shard |
column_name == kId |
column_name == schema->id_field_name() |
Row::extract_schema_ids |
field == kId |
field == schema->id_field_name() |
| Query execution (join keys, dedup) |
kId |
Resolve from schema at query-plan time |
tundra_shell.cpp CREATE |
Skips "id" field |
Skips the schema's id_field_name() |
| Shard min_id / max_id tracking |
Assumes "id" |
Uses schema->id_field_name() |
Persistence / metadata
SchemaMetadata (used for snapshot serialization) needs a new field:
{
"name": "Person",
"id_field": "person_id",
"fields": [ ... ]
}
Existing snapshots without "id_field" default to "id" for backward compatibility.
Edge ID columns
Edge structural columns ("id", "source_id", "target_id", "created_ts") remain hardcoded. Edges are system-managed and not user-schema-defined — this is a separate concern. A future enhancement could allow custom edge property schemas, but that is out of scope here.
Affected Files
| File |
Change |
include/schema/schema.hpp |
Add id_field_name_, id_field(), validation |
src/schema/schema.cpp |
Validate ID field on creation, remove prepend_id_field usage |
include/arrow/utils.hpp |
Deprecate/remove prepend_id_field() |
src/arrow/utils.cpp |
Remove prepend_id_field() |
include/core/node.hpp |
Use schema->id_field() instead of kId |
src/storage/storage.cpp |
Use schema->id_field_name() for ID column detection |
include/query/row.hpp |
Use schema-aware ID resolution |
src/query/execution.cpp |
Use schema-aware ID resolution in join/dedup logic |
apps/tundra_shell.cpp |
Use schema-aware ID skip in CREATE, display in results |
include/storage/metadata.hpp |
Add id_field to SchemaMetadata serialization |
include/common/constants.hpp |
kId becomes the default, not the only option |
src/main/database.cpp |
Pass schema context where ID field name is needed |
Testing
- Unit test: create schema with custom ID name, insert nodes, verify ID column name in results
- Unit test: create schema without explicit ID, verify
"id" is auto-prepended (backward compat)
- Unit test: reject schema with two
ID fields
- Unit test: reject schema with zero
ID fields (if option A chosen)
- Snapshot round-trip: save and restore a schema with custom ID, verify field names survive
- Join test: join two schemas with different ID column names
- Shell test: CREATE/MATCH with custom ID field name
Non-Goals
- Custom ID types (e.g. UUID, STRING primary keys) — INT64 auto-increment only for now
- Custom edge ID column names — edges remain system-managed
- Composite primary keys — single-column only
ID type
Summary
Allow users to name the primary key column when defining a schema, instead of forcing it to be
"id". The ID column remains required and auto-generated (INT64), but the name is user-defined.Motivation
Today every node schema has a hidden, hardcoded
"id"column prepended byprepend_id_field(). Users cannot:person_id,order_id,pk)schema.iddot-notationThe
"id"string is baked into ~15 code locations across 10+ files viafield_names::kId.Current Behavior
Schema creation (
SchemaRegistry::createinsrc/schema/schema.cpp):Calls
prepend_id_field()which unconditionally addsarrow::field("id", arrow::int64())as field 0.Node creation (
NodeManager::create_nodeininclude/core/node.hpp):Writes the auto-generated ID into the field named
field_names::kId("id").Query execution (
src/query/execution.cpp,include/query/row.hpp):Looks up
field_names::kIdby name to extract node IDs for join keys, row deduplication, and result ordering.Storage restore (
src/storage/storage.cpp):Matches
column_name == field_names::kIdto extract the node ID during shard reload.Shell (
apps/tundra_shell.cpp):Skips the
"id"field during INSERT (auto-generated), displays it in results.Edge store (
src/core/edge_store.cpp):Edges have their own structural
"id"column (separate concern, see non-goals).Proposed Design
Schema-level ID metadata
Add an
id_field_nameattribute toSchema:Schema creation API
When creating a schema, the user specifies which field is the ID:
The
IDtype marker tells the system:INT64, auto-generated, non-nullableIDIf no field is marked
ID, the system can either:"id"field (backward-compatible)Recommendation: Option B for backward compatibility, with a deprecation warning encouraging explicit ID declaration.
Internal changes
Replace all
field_names::kIdlookups on node schemas withschema->id_field_name():Schema::crete_arrow_schema"id"ID-typed field existsNodeManager::create_nodeschema->get_field("id")schema->id_field()prepend_id_field()arrow::field("id", int64)Storage::read_shardcolumn_name == kIdcolumn_name == schema->id_field_name()Row::extract_schema_idsfield == kIdfield == schema->id_field_name()kIdtundra_shell.cppCREATE"id"fieldid_field_name()"id"schema->id_field_name()Persistence / metadata
SchemaMetadata(used for snapshot serialization) needs a new field:{ "name": "Person", "id_field": "person_id", "fields": [ ... ] }Existing snapshots without
"id_field"default to"id"for backward compatibility.Edge ID columns
Edge structural columns (
"id","source_id","target_id","created_ts") remain hardcoded. Edges are system-managed and not user-schema-defined — this is a separate concern. A future enhancement could allow custom edge property schemas, but that is out of scope here.Affected Files
include/schema/schema.hppid_field_name_,id_field(), validationsrc/schema/schema.cppprepend_id_fieldusageinclude/arrow/utils.hppprepend_id_field()src/arrow/utils.cppprepend_id_field()include/core/node.hppschema->id_field()instead ofkIdsrc/storage/storage.cppschema->id_field_name()for ID column detectioninclude/query/row.hppsrc/query/execution.cppapps/tundra_shell.cppinclude/storage/metadata.hppid_fieldtoSchemaMetadataserializationinclude/common/constants.hppkIdbecomes the default, not the only optionsrc/main/database.cppTesting
"id"is auto-prepended (backward compat)IDfieldsIDfields (if option A chosen)Non-Goals