-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Description
Describe the bug
I am having a hard time to make segger works even from the loading data steps.
I have installed the main branch.
When I follow the Introduction to Segger tutorial on https://elihei2.github.io/segger_dev, after running:
merscope_data_dir = Path('/beegfs/scratch/prj/Spatial/data/merscope/human_brain_1k')
segger_data_dir = Path('/beegfs/scratch/prj/Spatial/results/merscope/human_brain_1k/segger')
sample = STSampleParquet(
base_dir=merscope_data_dir,
n_workers=4,
sample_type='merscope'
)
I get:
Traceback (most recent call last):
File "/opt/common/tools/ric.iannacone/envs/segger-env/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3812, in get_loc
return self._engine.get_loc(casted_key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "pandas/_libs/index.pyx", line 167, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 196, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 7096, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'global_x'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/beegfs/scratch/prj/Spatial/benchmark_ist/code/segger/src/segger/data/parquet/sample.py", line 85, in __init__
utils.ensure_transcript_ids(
File "/beegfs/scratch/prj/Spatial/code/segger/src/segger/data/parquet/_utils.py", line 499, in ensure_transcript_ids
df = add_transcript_ids(df, x_col, y_col, id_col, precision)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/beegfs/scratch/prj/Spatial/code/segger/src/segger/data/parquet/_utils.py", line 445, in add_transcript_ids
x_coords = np.round(transcripts_df[x_col] * precision).astype(int).astype(str)
~~~~~~~~~~~~~~^^^^^^^
File "/opt/common/tools/envs/segger-env/lib/python3.11/site-packages/pandas/core/frame.py", line 4107, in __getitem__
indexer = self.columns.get_loc(key)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/common/tools/envs/segger-env/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3819, in get_loc
raise KeyError(key) from err
KeyError: 'global_x'
When I instead run the Merscope dataset creation, with:
from segger.data import MerscopeSample
First I get:
>>> from segger.data import MerscopeSample
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: cannot import name 'MerscopeSample' from 'segger.data' (/beegfs/scratch/prj/Spatial/code/segger/src/segger/data/__init__.py)
Then I get around with:
from segger.data.io import MerscopeSample
But when I try:
# Create a MerscopeSample instance for spatial transcriptomics processing
merscope_sample = MerscopeSample()
# Load transcripts from a CSV file
merscope_sample.load_transcripts(
base_path=merscope_data_dir,
sample=sample_tag,
transcripts_filename="detected_transcripts.csv",
file_format="csv"
)
I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/beegfs/scratch/ric.iannacone/ric.iannacone/prj/Spatial/benchmark_ist/code/segger/src/segger/data/io.py", line 130, in load_transcripts
raise ValueError("This version only supports parquet files with Dask.")
ValueError: This version only supports parquet files with Dask.
Expected behavior
A clear and concise description of what you expected to happen.
- OS: Ubuntu 22.04.5 LTS
- Python version: 3.11.13
- Package version: segger_dev main branch
Metadata
Metadata
Assignees
Labels
No labels