Skip to content

[fix] Fix BatchMeta.union semantics#95

Merged
0oshowero0 merged 2 commits into
Ascend:mainfrom
0oshowero0:fix_batchmeta
May 13, 2026
Merged

[fix] Fix BatchMeta.union semantics#95
0oshowero0 merged 2 commits into
Ascend:mainfrom
0oshowero0:fix_batchmeta

Conversation

@0oshowero0
Copy link
Copy Markdown
Collaborator

@0oshowero0 0oshowero0 commented May 12, 2026

Problem

PR https://gitcode.com/Ascend/TransferQueue/pull/28 incorrectly rewrote BatchMeta.union to behave like concat with global_index deduplication.

  • Original semantics: merge fields for samples with identical global_indexes (row-aligned, column-expanded).
  • Broken semantics: append rows from other whose global_indexes are not present in self.

This broke the design boundary between union (same rows, merge columns) and concat (same columns, append rows).

Changes

1. Restore BatchMeta.union (transfer_queue/metadata.py)

  • Validate that both batches have the same size, global_indexes, and partition_ids.
  • Merge field_schema with other overriding self on name conflicts.
  • Merge production_status conservatively via np.bitwise_and (both sides must report ready).
  • Merge extra_info, custom_meta, and _custom_backend_meta per sample.

2. Update tutorial (tutorial/03_metadata_concepts.py)

  • Example now uses overlapping fields (attention_mask present in both batches) to demonstrate the override behavior.
  • Corrected comments to clearly distinguish concat (append rows) vs union (merge columns).

3. Update unit tests (tests/test_metadata.py)

Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
Copilot AI review requested due to automatic review settings May 12, 2026 08:53
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes BatchMeta.union from “deduplicate by global_index” semantics to “merge field schemas for the same samples”, requiring identical global_indexes and partition_ids and combining metadata/flags accordingly. It updates the tutorial to reflect the new meaning of union vs concat and adds tests covering the revised behavior.

Changes:

  • Redefined BatchMeta.union to require matching indices/partitions and to merge field_schema, production_status, extra_info, and per-sample metadata.
  • Updated the metadata tutorial example and explanation to match the new “append columns/fields” union semantics.
  • Added unit tests validating success cases, overlap behavior, conservative readiness merging, and mismatch errors.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tutorial/03_metadata_concepts.py Updates tutorial example/output to demonstrate field-merge union semantics.
transfer_queue/metadata.py Reimplements BatchMeta.union to merge fields for identical samples and adds validation.
tests/test_metadata.py Adds tests for the new union behavior and validation errors.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread transfer_queue/metadata.py Outdated
Comment on lines +631 to +643
if self.global_indexes != other.global_indexes:
raise ValueError(
f"BatchMeta.union: global_indexes do not match. "
f"self.global_indexes={self.global_indexes}, "
f"other.global_indexes={other.global_indexes}"
)

if not unique_indices_in_other:
return self
if self.partition_ids != other.partition_ids:
raise ValueError(
f"BatchMeta.union: partition_ids do not match. "
f"self.partition_ids={self.partition_ids}, "
f"other.partition_ids={other.partition_ids}"
)
Signed-off-by: 0oshowero0 <o0shower0o@outlook.com>
@ascend-robot
Copy link
Copy Markdown

CLA Signature Pass

0oshowero0, thanks for your pull request. All authors of the commits have signed the CLA. 👍

@0oshowero0 0oshowero0 changed the title [fix] Fix BatchMeta.union functionality [fix] Fix BatchMeta.union semantics May 12, 2026
@0oshowero0 0oshowero0 merged commit 270ea73 into Ascend:main May 13, 2026
8 checks passed
@0oshowero0 0oshowero0 deleted the fix_batchmeta branch May 13, 2026 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants