Skip to content

Schema Support - Logical Separation for Stores#343

Open
gregorian-09 wants to merge 28 commits into
deven96:mainfrom
gregorian-09:main
Open

Schema Support - Logical Separation for Stores#343
gregorian-09 wants to merge 28 commits into
deven96:mainfrom
gregorian-09:main

Conversation

@gregorian-09

@gregorian-09 gregorian-09 commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Closes #319
Supersedes #341 (source branch feat/schema-support was deleted).

Rebased on latest upstream main with additional fixes:

  • Schema::try_new to propagate errors instead of panicking on empty string
  • Versioned persistence trait with auto-migration (V1 flat -> V2 nested)
  • Hardcoded public references replaced with Schema::DEFAULT_NAME
  • Clippy clean, format clean

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably add one DB and AI json for the new version formats respectively (if that doesn't exist already)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done Sir. Added ai_v2_snapshot.json and db_v2_snapshot.json fixtures under ahnlich/ai/src/tests/fixtures/ and ahnlich/db/src/tests/fixtures/, plus migration tests for both DB and AI loading these V2 snapshots.

@gregorian-09

Copy link
Copy Markdown
Contributor Author

tests::server_tests::test_mmap_persistence_performance is still failing. I’ll do a proper investigation tomorrow and see what’s causing it.

Comment thread ahnlich/ai/src/engine/operations.rs Outdated
) -> Result<Del, Status> {
let store_names: Vec<StoreName> = store_handler
.list_stores()
.list_stores(None)

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So purge stores would only work on public schema?

@gregorian-09 gregorian-09 Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct, purge_stores now only works on the public schema (ahnlich/ai/src/engine/operations.rs:401). Previously list_stores(None) returned stores from all schemas while drop_store with schema: None defaulted to public, so non-public schema stores were listed but never actually dropped - this was a bug. I already fixed that by scoping to list_stores(Some(&Schema::default())) and passing schema explicitly to DbDropStore. If you'd like it to be schema-aware and purge all schemas, that would require adding schema to AiStoreInfo and propagating it through list_stores. Happy to go that route if preferred, let me know.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think for consistency sake it would be better to make purge stores schema aware as well

Comment thread ahnlich/ai/src/engine/versioned.rs Outdated

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work on these.... maybe generate one using an actual test so that the fields are not empty e.g id_to_value, inner, allowed_predicates, algorithm_to_index are currently empty in this test so if anything changes internally in those, this test would still pass

I realise that may be extra work to do but it would be great utility as a regression test for schema changes in general going forward

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good call. I have gone ahead and done exactly that. The V2 fixture at db/src/tests/fixtures/db_v2_snapshot.json now has actual data: two entries with embeddings + metadata, predicates with values wired up, and a built KdTree index. One thing I ran into: StoreKeyId was using #[serde(transparent)] over u64, but JSON map keys are always strings and u64::deserialize chokes on strings so the roundtrip was broken. Fixed that by switching to string-based serde for StoreKeyId (in db/src/engine/store.rs).

The fixture is generated from an #[ignore]d test so you can regenerate whenever internal types change, just run cargo test -p db --lib generate_db_v2_fixture -- --ignored. Old V1 snapshots still load fine, and all 67 db lib tests pass

@deven96 deven96 Jun 17, 2026

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good shout with the id_to_value StoreKeyId. However, it has worked all this while as u64 map key would be serialized as a string and and also deserialized as such automatically. You can test it with something simple like this

let test_map: HashMap<u64, String> = HashMap::from_iter([(10, "Random".into())]);
serde_json::to_string_pretty(&test_map).unwrap();

It's only an issue when you handroll your json like in this case to account for an older format... so we can get rid of the manual se-deser handling

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were right. The fixture was the problem, not serde. I regenerated db_v2_snapshot.json from an actual test so the keys come out as numbers matching #[serde(transparent)], dropped the custom serde, and also refactored the tagged enum loading in versioned.rs to peel the version tag first with std::collections::HashMap before converting to papaya, that was what was actually breaking the persistence roundtrip, not the transparent derive itself. Clean on both ends now.

Comment thread ahnlich/db/src/engine/versioned.rs Outdated
Comment thread ahnlich/db/src/tests/server_tests.rs
Comment thread ahnlich/db/src/tests/server_tests.rs Outdated
Comment thread ahnlich/ai/src/tests/aiproxy_test.rs Outdated
assert_eq!(response.stores.len(), 1);
assert_eq!(response.stores[0].name, "AiCustomStore");

// List stores with no filter - should return all 2 stores

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

List stores with no filter shouldn't return all stores.... would be best to stick to the behaviour to default to public if schema is None

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list_stores(None) now defaults to public on both DB and AI sides. No more sweeping across all schemas when no schema is given. Same consistent behavior as everything else: if schema is unspecified, it's "public".

assert!(names.contains(&"CustomStore"));
assert!(names.contains(&"CustomStore2"));

// List stores with no schema filter - should return all 3 stores

@deven96 deven96 Jun 17, 2026

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably best to make consistent with all others. Listing all stores across all schemas can become very unwieldy

@deven96

deven96 commented Jun 17, 2026

Copy link
Copy Markdown
Owner

There's been some merged changes from @jimezesinachi with respect to clustering mode for ahnlich so perhaps @jimezesinachi can help work through what would be needed here to make the modifications for logical separation fit with his changes

Introduce a `Schema` type wrapping a `String` to provide logical
separation for stores in ahnlich. This is the foundation for
issue deven96#319 (multi-tenant store isolation).

Key design decisions:
- Newtype over `String` (not protobuf-generated), hand-written in
  `types/src/schema.rs`.
- Default schema name is "public" to preserve backward compatibility
  with all existing stores.
- Derives `Hash`, `Eq`, `Ord`, `Clone`, `Debug`, `Serialize`,
  `Deserialize` for use as a map key.
- `From<String>` and `From<&str>` for ergonomic construction.
- Panics on empty name to prevent silent corruption.
- Separate from the protobuf layer so we can evolve the schema
  model without regenerating proto types.

This commit only adds the type; refactoring the store engine to
use it comes next.
Introduce a `Schema` type wrapping a `String` to provide logical
separation for stores in ahnlich. This is the foundation for
issue deven96#319 (multi-tenant store isolation).

Key design decisions:
- Newtype over `String` (not protobuf-generated), hand-written in
  `types/src/schema.rs`.
- Default schema name is "public" to preserve backward compatibility
  with all existing stores.
- Derives `Hash`, `Eq`, `Ord`, `Clone`, `Debug`, `Serialize`,
  `Deserialize` for use as a map key.
- `From<String>` and `From<&str>` for ergonomic construction.
- Panics on empty name to prevent silent corruption.
- Separate from the protobuf layer so we can evolve the schema
  model without regenerating proto types.

This commit only adds the type; refactoring the store engine to
use it comes next.
…ema map

Address two compilation issues in the DB store handler refactoring:
- Use `existing.current.clone()` instead of `existing.clone()` on
  papaya's `OccupiedError` (which does not implement Clone).
- Resolve type inference for `inner.pin().len()` in `drop_schema`.

Update operations.rs and handler.rs pass `Schema::default()` in all
`StoreHandler` method calls, since the protobuf layer does not yet
carry a schema field.

This maintains full backward compatibility: every store operation
implicitly targets the `"public"` schema.
Apply the same nested-map pattern from db StoreHandler to
AIStoreHandler:

- Change AIStores from `Map<StoreName, AIStore>` to
  `Map<Schema, Map<StoreName, AIStore>>`.
- Add `AIInnerStores` type alias.
- Add `get_or_create_schema()` and `get_schema()` helpers.
- Update `create_store()`, `get()`, `list_stores()`, `get_store()`,
  `set()`, `validate_and_prepare_store_data()`,
  `get_ndarray_repr_for_store()`, `drop_store()`, `store_original()`,
  and `purge_stores()` to accept and use a schema parameter.
- Add `drop_schema()` method (guarded: cannot drop "public").
- Add `AIProxyError::InvalidArgument(String)` variant for schema
  guard rejection.

All existing callers pass `Schema::default()` for backward
compatibility.
Update all AI operations and the AI server handler to pass
`Schema::default()` in direct `AIStoreHandler` method calls.

This completes the schema-aware refactoring of the AI proxy layer.
All existing store operations continue to target the `"public"`
schema until the protobuf layer carries an explicit schema field.
The types crate's build.rs deletes all files in `src/` except `utils/`
before regenerating protobuf code. This caused our hand-written
`schema.rs` to be silently deleted on every `cargo build/check`,
and caused `pub mod schema;` to be omitted from `lib.rs`.

Fix the deletion loop to also preserve `schema.rs`, and add
`"schema"` to the module-name set written into `lib.rs`.

Also fix type inference for `inner.pin().len()` in both
`StoreHandler::drop_schema` and `AIStoreHandler::drop_schema`
by binding the pin guard to a local variable first.
The schema parameter was added to all store handler methods in a previous
commit, but threading &Schema through every method signature introduced
too_many_arguments and leaked schema handling into every caller. Each
handler already holds a default_schema field that methods use implicitly.

- Remove schema param from StoreHandler: create_store, create_pred_index,
  create_non_linear_algorithm_index, del_key_in_store, del_pred_in_store,
  get_sim_in_store, get_pred_in_store, get_key_in_store, set_in_store,
  get_store, drop_pred_index_in_store, drop_non_linear_algorithm_index, drop_store.
- Remove schema param from AIStoreHandler: get, create_store, get_store,
  validate_and_prepare_store_data, set, get_ndarray_repr_for_store, drop_store, store_original.
- Add default_schema field to AIStoreHandler struct and new().
- Drop #[allow(clippy::too_many_arguments)] on set and get_ndarray_repr_for_store.
- Update all callers to stop passing &Schema::default().
- Fix del_pred_in_store incorrectly named create_pred_index.
- Remove unused Schema imports.
Add optional schema field to CreateStore, DropStore, ListStores,
and GetStore protobuf messages. Add DropSchema message and RPC
for dropping entire schemas.

Proto changes:
- Add optional string schema field to CreateStore, DropStore,
  ListStores, GetStore in both db/ and ai/ query protos
- Add DropSchema message (string schema) to both db/ and ai/ protos
- Add DropSchema to pipeline oneof and service RPC definitions

Store handler changes:
- get() searches all schemas (default_schema first, then fallback)
- create_store, drop_store accept &Schema parameter
- list_stores accepts Option<&Schema> for filtering
- drop_schema already existed but is now wired through protobuf

Operations layer:
- Each operation extracts schema from request, resolves to
  Schema (defaults to "public" when None), passes to handler
- Add drop_schema function for both DB and AI operations
- Schema import added where needed

All 58 DB tests pass. Both crates compile with clippy.
…tubs

- Fix purge_stores counting schemas instead of stores (ai/src/engine/store.rs)
- Add AI proxy schema tests (ai/src/tests/aiproxy_test.rs)
- Add DB schema tests (db/src/tests/server_tests.rs)
- Fix DSL missing schema fields (dsl/src/ai.rs, dsl/src/db.rs, tests)
- Regenerate all SDK protobuf stubs (Go, Node, Python)
- Add build_fixed.py for Python proto generation
- Add multi-schema docs (docs/schema.md)
- Go: goimports + gofmt
- Node: prettier
- Python: isort import sorting
- Add read_snapshot_raw and load_snapshot_with_migration to Persistence
- Add StoreHandler::load_and_migrate_snapshot for DB flat->nested migration
- Add AIStoreHandler::load_and_migrate_snapshot for AI flat->nested migration
- Wire migration into DB and AI server handler startup
- Add migration tests (2 DB + 2 AI) with JSON fixture files
- Fix benchmark create_store calls to pass schema parameter
- Fix Go SDK lint: add package comments, fix revive warnings
- Replace redundant closures with tuple variant in map_err calls
- Remove build_fixed.py from Python SDK as requested
- Added schema field to AiStoreInfo proto (field 8) for all SDKs
- list_stores(None) now defaults to public schema instead of all schemas
- Added all_store_names_by_schema() for internal cross-schema iteration
- purge_stores now drops DB stores across all schemas using correct schema per store
- Regenerated all gRPC stubs (Rust, Go, Node, Python)
- Populated schema in get_store from request params
…NodeId

#[serde(transparent)] breaks JSON map key deserialization because
serde_json's KeyDeserialize trait doesn't propagate through transparent
newtypes. JSON map keys are always strings, and u64::deserialize chokes
on strings. Restored custom serde that serializes as string and
deserializes from both strings and numbers.
@gregorian-09

gregorian-09 commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

There's been some merged changes from @jimezesinachi with respect to clustering mode for ahnlich so perhaps @jimezesinachi can help work through what would be needed here to make the modifications for logical separation fit with his changes

@deven96 @jimezesinachi

So, I went through this diff by diff against everything Jim's clustering touches. writing it up because I want someone to actually check the one thing I'm not 100% sure about, not just rubber-stamp it.

store_runtime.rs / cluster_mutations.rs - clean, no diff. enum, submit_db_command!, all of Jim's stuff is exactly as he left it.

handler.rs - three additions, all additive:

  • drop_schema handler, copy-pasted the drop_store pattern:
StoreRuntime::Cluster(cluster) => {
    submit_db_command!(Some(cluster), query::DropSchema, params, DbCommand::DropSchema)
}
StoreRuntime::Standalone(store_handler) => {
    operations::drop_schema(store_handler, params)?
}
  • list_stores actually forwards the request body now instead of ignoring it (list_stores_response(&self.runtime, request.into_inner())). cluster dispatch inside is untouched, this just lets schema: None flow through to standalone.
  • get_store resolves schema before dispatch, same split as before.
  • new_with_config swapped Persistence::load_snapshot for load_snapshot_with_migration to handle the versioned format. standalone-only, cluster has its own snapshot mechanism so this doesn't touch it.

cluster_queries.rs - one new param:

pub(crate) async fn list_stores_response(
    runtime: &StoreRuntime,
    params: query::ListStores,
) -> Result<server::StoreList, tonic::Status> {
    if let Some(cluster) = runtime.cluster() {
        cluster.list_stores(params).await?;
    }
    read_store_handler(runtime, |store_handler| {
        Ok(operations::list_stores(store_handler, params))
    })
}

heads up - the cluster branch's return value isn't used, it always falls through to read_store_handler after. pretty sure that's pre-existing and not something I introduced, but flagging it because it reads weird on first pass. param only actually lands in operations::list_stores via the standalone path.

replication/mod.rs - two changes:

  • DropSchema command, same shape as DropStore, nothing surprising.
  • get_snapshot now does into_latest() to turn VersionedDbStores into Stores before handing it to the cluster, since snapshots are versioned internally but the cluster snapshot type is still plain Stores:
fn get_snapshot(&self) -> Self::Snapshot {
    self.store_handler.get_snapshot().into_latest()
        .expect("snapshot should be at latest version")
}

this is the one I want someone else's eyes on. it's an .expect() sitting on the replication snapshot path - if a node ever hands back a snapshot that isn't at the latest version, this panics instead of returning an error. probably can't happen given how migration is wired right now, but "probably can't happen" on a path that takes down a node is exactly the kind of thing I don't want to just wave through myself.

restore_snapshot - completely untouched.

operations.rs - all schema plumbing, nothing clever:

  • resolve_schema() helper, Option<String> to Schema, defaults to public
  • create_store/drop_store/list_stores take schema now, None behaves identically to before
  • new drop_schema, self-contained
  • threw in explicit type annotations on non_linear_indices (e.g. StdHashSet<non_linear_index::Index>) because the compiler asked nicely. no behavior change.

test files - cluster_tests.rs and replication_store_tests.rs, every diff is just schema: None added to existing calls. didn't touch any test logic.

so: nothing in Jim's clustering paths actually changed, and everything new degrades to old behavior when schema is None. I'm fairly confident in all of it except the .expect() in get_snapshot - would like one of you to sanity check whether that assumption actually holds before this gets accepted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Logical separation for stores

2 participants