Skip to content

[BUG] Errors with Dataset Creation #122

@PietroD

Description

@PietroD

Describe the bug
I am having a hard time to make segger works even from the loading data steps.
I have installed the main branch.

When I follow the Introduction to Segger tutorial on https://elihei2.github.io/segger_dev, after running:

merscope_data_dir = Path('/beegfs/scratch/prj/Spatial/data/merscope/human_brain_1k')
segger_data_dir = Path('/beegfs/scratch/prj/Spatial/results/merscope/human_brain_1k/segger')

sample = STSampleParquet(
    base_dir=merscope_data_dir,
    n_workers=4,
    sample_type='merscope'
)

I get:

Traceback (most recent call last):
  File "/opt/common/tools/ric.iannacone/envs/segger-env/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
    return self._engine.get_loc(casted_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7096, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'global_x'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/beegfs/scratch/prj/Spatial/benchmark_ist/code/segger/src/segger/data/parquet/sample.py", line 85, in __init__
    utils.ensure_transcript_ids(
  File "/beegfs/scratch/prj/Spatial/code/segger/src/segger/data/parquet/_utils.py", line 499, in ensure_transcript_ids
    df = add_transcript_ids(df, x_col, y_col, id_col, precision)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/beegfs/scratch/prj/Spatial/code/segger/src/segger/data/parquet/_utils.py", line 445, in add_transcript_ids
    x_coords = np.round(transcripts_df[x_col] * precision).astype(int).astype(str)
                        ~~~~~~~~~~~~~~^^^^^^^
  File "/opt/common/tools/envs/segger-env/lib/python3.11/site-packages/pandas/core/frame.py", line 4107, in __getitem__
    indexer = self.columns.get_loc(key)
              ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/common/tools/envs/segger-env/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3819, in get_loc
    raise KeyError(key) from err
KeyError: 'global_x'

When I instead run the Merscope dataset creation, with:

from segger.data import MerscopeSample

First I get:

>>> from segger.data import MerscopeSample
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: cannot import name 'MerscopeSample' from 'segger.data' (/beegfs/scratch/prj/Spatial/code/segger/src/segger/data/__init__.py)

Then I get around with:

from segger.data.io import MerscopeSample

But when I try:

# Create a MerscopeSample instance for spatial transcriptomics processing
merscope_sample = MerscopeSample()

# Load transcripts from a CSV file
merscope_sample.load_transcripts(
    base_path=merscope_data_dir,
    sample=sample_tag,
    transcripts_filename="detected_transcripts.csv",
    file_format="csv"
)

I get:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/beegfs/scratch/ric.iannacone/ric.iannacone/prj/Spatial/benchmark_ist/code/segger/src/segger/data/io.py", line 130, in load_transcripts
    raise ValueError("This version only supports parquet files with Dask.")
ValueError: This version only supports parquet files with Dask.

Expected behavior
A clear and concise description of what you expected to happen.

  • OS: Ubuntu 22.04.5 LTS
  • Python version: 3.11.13
  • Package version: segger_dev main branch

Metadata

Metadata

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions