Skip to content

fix #391: convert GEF gene names to Unicode in bins reader path#437

Draft
cursor[bot] wants to merge 1 commit into
mainfrom
cursorstereopy-issue-handling-ad8e
Draft

fix #391: convert GEF gene names to Unicode in bins reader path#437
cursor[bot] wants to merge 1 commit into
mainfrom
cursorstereopy-issue-handling-ad8e

Conversation

@cursor
Copy link
Copy Markdown

@cursor cursor Bot commented Mar 30, 2026

Summary

When reading GEF files with bin_type='bins', the BgefR reader path did not convert gene names from byte strings (returned by gefpy) to Unicode strings. This caused all downstream gene name comparisons — such as those in SingleR annotation — to fail silently, since b'GeneX' != 'GeneX' in Python 3. The cell_bins path (CgefR) already performed this conversion correctly.

Changes

  • stereo/io/reader.py: Added .astype('U') conversion for gene_names and gene_id in both the filtered (gene_list/region) and unfiltered BgefR code paths, matching the existing CgefR behavior.
  • stereo/algorithm/single_r/single_r.py: Improved the SingleR gene intersection error message to include sample gene names and their Python types from both test and reference datasets, making future debugging much easier.

Verification

  • AST syntax check passed for both modified files
  • No new dependencies introduced
  • Python 3.8 compatible (no walrus operator, match/case, or union types)

Classification

  • Type: bug
  • Confidence: high
  • Severity: high

Closes #391

Open in Web View Automation 

The BgefR (bins) code path in read_gef() was not converting gene names
from byte strings to Unicode strings, unlike the CgefR (cell_bins) path
which already called .astype('U'). This caused all gene name comparisons
with Unicode-based reference data (e.g., H5AD files) to fail silently
in Python 3, since b'GeneX' != 'GeneX'.

Changes:
- stereo/io/reader.py: Add .astype('U') for gene_names and gene_id in
  both the filtered and unfiltered BgefR code paths, matching the
  existing CgefR behavior.
- stereo/algorithm/single_r/single_r.py: Improve the SingleR error
  message to include sample gene names and their types from both test
  and reference datasets for easier debugging.

Co-authored-by: wanruiwen-genomics-cn <wanruiwen-genomics-cn@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regarding the issue of using gef for SingleR annotation

1 participant