fix #391: convert GEF gene names to Unicode in bins reader path by cursor[bot] · Pull Request #437 · STOmics/Stereopy

cursor · 2026-03-30T03:28:15Z

Summary

When reading GEF files with bin_type='bins', the BgefR reader path did not convert gene names from byte strings (returned by gefpy) to Unicode strings. This caused all downstream gene name comparisons — such as those in SingleR annotation — to fail silently, since b'GeneX' != 'GeneX' in Python 3. The cell_bins path (CgefR) already performed this conversion correctly.

Changes

stereo/io/reader.py: Added .astype('U') conversion for gene_names and gene_id in both the filtered (gene_list/region) and unfiltered BgefR code paths, matching the existing CgefR behavior.
stereo/algorithm/single_r/single_r.py: Improved the SingleR gene intersection error message to include sample gene names and their Python types from both test and reference datasets, making future debugging much easier.

Verification

AST syntax check passed for both modified files
No new dependencies introduced
Python 3.8 compatible (no walrus operator, match/case, or union types)

Classification

Type: bug
Confidence: high
Severity: high

Closes #391

The BgefR (bins) code path in read_gef() was not converting gene names from byte strings to Unicode strings, unlike the CgefR (cell_bins) path which already called .astype('U'). This caused all gene name comparisons with Unicode-based reference data (e.g., H5AD files) to fail silently in Python 3, since b'GeneX' != 'GeneX'. Changes: - stereo/io/reader.py: Add .astype('U') for gene_names and gene_id in both the filtered and unfiltered BgefR code paths, matching the existing CgefR behavior. - stereo/algorithm/single_r/single_r.py: Improve the SingleR error message to include sample gene names and their types from both test and reference datasets for easier debugging. Co-authored-by: wanruiwen-genomics-cn <wanruiwen-genomics-cn@users.noreply.github.com>

wanruiwen-genomics-cn mentioned this pull request Mar 30, 2026

Regarding the issue of using gef for SingleR annotation #391

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix #391: convert GEF gene names to Unicode in bins reader path#437

fix #391: convert GEF gene names to Unicode in bins reader path#437
cursor[bot] wants to merge 1 commit into
mainfrom
cursorstereopy-issue-handling-ad8e

cursor Bot commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

cursor Bot commented Mar 30, 2026

Summary

Changes

Verification

Classification

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant