fix(serialization): use HashingTag so BSL metadata contributes to content hash by ghoersti · Pull Request #264 · boringdata/boring-semantic-layer

ghoersti · 2026-05-19T02:40:16Z

Closes #263

Summary

to_tagged() wraps BSL expressions in a tag node so xorq build/catalog tooling can recover the BSL metadata. Previously it used Table.tag(...), which produces a plain Tag op. xorq's hash machinery (opaque_node_replacer in dask_normalize_expr) is transparent to plain Tag — only HashingTag participates in the content hash. As a result, source and source.tag("bsl", **metadata) produced identical content hashes, so two xorq build invocations — one for a raw source, one for a BSL model on top of the same source — silently overwrote each other under builds/<hash>/.

Switching to Table.hashing_tag(...) makes BSL metadata participate in the content hash. HashingTag is a Tag subclass, so existing isinstance(op, Tag) checks in serialization/reconstruct.py continue to work unchanged.

Change

src/boring_semantic_layer/serialization/__init__.py

- xorq_table = xorq_table.tag(tag="bsl", **tag_data)
+ xorq_table = xorq_table.hashing_tag(tag="bsl", **tag_data)

Tests

Four new regression tests in tests/test_xorq_tag_handler.py pin the contract:

test_tagged_op_is_hashing_tag — concrete op type is HashingTag, not bare Tag.
test_tagged_hash_differs_from_untagged_source — dask.base.tokenize of tagged vs. untagged differs.
test_tagged_hash_distinguishes_models — two different BSL models on the same ibis table produce different hashes (the user-visible bug from to_tagged() uses Tag instead of HashingTag, causing build hash collisions #263).
test_tagged_hash_is_deterministic — to_tagged() is stable across calls.

Verified the first three fail on the pre-fix code (hashes match, collision reproduced) and all four pass on the fix. Full suite: 989 passed, 15 skipped, 10 xfailed, 6 xpassed (was 985 — +4 new tests).

Test plan

New regression tests fail without the fix and pass with it
Full pytest src/boring_semantic_layer/tests/ green
to_tagged / from_tagged round-trip unchanged (existing tag-handler tests pass)

Follow-up

A separate PR (george/update/bump-xorq-0-3-24) will bump xorq>=0.3.24 and address the test failures surfaced by that bump (renamed xo.read_parquet → xo.deferred_read_parquet and tightened type inference on xorq.vendor.ibis literals).

🤖 Generated with Claude Code

…tent hash Closes boringdata#263 Both `to_tagged()` and `reemit()` previously wrapped BSL expressions in a plain Tag node, which xorq's `opaque_node_replacer` strips during content-hash computation. The result: `source` and `source.tag("bsl", **metadata)` produced identical hashes, so two `xorq build` invocations (one for the raw source, one for a BSL model over it) silently overwrote each other under `builds/<hash>/`. The same regression applied to rebuilt artifacts coming out of `reemit`. Switch both call sites to `Table.hashing_tag(...)` so BSL metadata participates in the hash. HashingTag is a Tag subclass, so existing `isinstance(op, Tag)` checks in the reconstruct path are unaffected. Tests: - Drop `@pytest.mark.xfail` from the pre-existing `test_different_measures_produce_different_hashes` in test_xorq_convert.py — that test was xfailed against the bug and now passes (uses xorq's authoritative `compute_expr_hash`). - Add two new tests in test_xorq_tag_handler.py: * test_tagged_op_is_hashing_tag — outer op type is HashingTag. * test_tagged_hash_differs_from_untagged_source — tagged hash differs from bare source (covers untagged-vs-tagged, no pre-existing duplicate). - Add two new reemit regression tests: * test_reemit_preserves_hashing_tag — pin that `reemit` re-stamps with HashingTag, so rebuild paths keep the hash contract. * test_reemit_hash_distinguishes_metadata — round-trip via `to_tagged → reemit` still produces distinct hashes for distinct metadata. All new tests use `compute_expr_hash` (xorq's content-hash function) rather than `dask.base.tokenize` for consistency with the existing tests in test_xorq_convert.py. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Patch bump covering join fixes on plain ibis backends (boringdata#222), grain mismatch on `with_measures` over joins (boringdata#261), the ibis-native calc-measure classifier refactor (boringdata#262), and the HashingTag serialization fix (boringdata#264). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ghoersti force-pushed the george/fix/tag-to-hashing-tag branch from 6576c32 to 9beb3e2 Compare May 19, 2026 04:12

ghoersti mentioned this pull request May 19, 2026

chore(deps): bump xorq>=0.3.24 and adapt tests to API changes #265

Closed

2 tasks

hussainsultan merged commit 68a65d3 into boringdata:main May 19, 2026
3 checks passed

hachej mentioned this pull request May 19, 2026

chore: prepare v0.3.14 release #266

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(serialization): use HashingTag so BSL metadata contributes to content hash#264

fix(serialization): use HashingTag so BSL metadata contributes to content hash#264
hussainsultan merged 1 commit into
boringdata:mainfrom
ghoersti:george/fix/tag-to-hashing-tag

ghoersti commented May 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ghoersti commented May 19, 2026

Summary

Change

Tests

Test plan

Follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants