Skip to content

It takes more than 6 hours for CHOIR dealing with 170k cells using 100 CPUs? #44

Description

@LongpanICR

Hi developing team,
It's a quite interesting tool. I tried to use CHOIR to determine the precise number of clustering in 170k cells on HPC with 100 CPUs. But just for building tree, it took more than 6 hours. Also, I check the running memory used in the Rstudio, it only showed 20-30 Gb used. I suspected whether the 100 CPUs were used during the calculation. Or it's the reality of CHOIR?

Thank you!
Long

The following are the screenshot for the analysis:

cat("Number of genes with zero counts across all cells:", sum(genes_all_zero), "\n")
Number of genes with zero counts across all cells: 2091 
> seurat_obj <- seurat_obj[!genes_all_zero, ]
> seurat_obj
An object of class Seurat 
33034 features across 170036 samples within 1 assay 
Active assay: RNA (33034 features, 0 variable features)
 2 layers present: counts, data
> # Step 1: Generate hierarchical clustering tree
> # ?buildTree
> seurat_obj <- buildTree(seurat_obj,
+                         cluster_params = list(algorithm = 1, 
+                                               group.singletons = TRUE),
+                         n_cores = 100)
2025-08-07 04:00:42 PM : (Step 1/7) Checking inputs and preparing object..

Input data:
 - Object type: Seurat (v5)
 - # of cells: 170036
 - # of batches: 1
 - # of modalities: 1
 - ATAC data: FALSE
 - Countsplitting: FALSE
 - Assay: RNA
 - Layer used to build tree: data
 - Layer used to prune tree: data

Proceeding with the following parameters:
 - Intermediate data stored under key: CHOIR
 - Alpha: 0.05
 - Multiple comparison adjustment: bonferroni
 - Features to train RF: var
 - # of excluded features: 0
 - # of permutations: 100
 - # of RF trees: 50
 - Use variance: TRUE
 - Minimum accuracy: 0.5
 - Minimum connections: 1
 - Maximum repeated errors: 20
 - Maximum cells sampled: Inf
 - Downsampling rate: 0.173
 - Minimum reads: >0 reads
 - Maximum clusters: auto
 - Minimum cluster depth: 2000
 - Normalization method: none
 - Subtree dimensionality reductions: TRUE
 - Dimensionality reduction method: Default
 - Dimensionality reduction parameters provided: No
 - # of variable features: Default
 - Batch correction method: none
 - Batch correction parameters provided: No
 - Nearest neighbor parameters provided: 
     - verbose: FALSE
 - Clustering parameters provided: 
     - algorithm: 1
     - group.singletons: TRUE
     - verbose: FALSE
 - # of cores: 100
 - Random seed: 1

2025-08-07 04:00:42 PM : (Step 2/7) Running initial dimensionality reduction..
2025-08-07 04:00:42 PM : Preparing matrix using 'RNA' assay and 'data' slot..
2025-08-07 04:00:53 PM : Running PCA with 2000 variable features..
2025-08-07 04:01:59 PM : (Step 3/7) Generating initial nearest neighbors graph..
2025-08-07 04:02:59 PM : (Step 4/7) Identify starting clustering resolution..
2025-08-07 05:03:24 PM : At resolution = 1, 106 clusters. [2 iterations]
2025-08-07 05:46:46 PM : At resolution = 0.7, 94 clusters. [3 iterations]
2025-08-07 06:21:00 PM : At resolution = 0.4, 73 clusters. [4 iterations]
2025-08-07 06:50:21 PM : Starting resolution: 0.1
2025-08-07 06:50:21 PM : (Step 5/7) Building root clustering tree..
2025-08-07 07:24:32 PM : At resolution = 0.08, 32 clusters. [3 iterations]
2025-08-07 08:13:26 PM : At resolution = 0.12, 39 clusters. [6 iterations]
2025-08-07 08:46:06 PM : At resolution = 0.07, 29 clusters. [8 iterations]
2025-08-07 09:20:08 PM : At resolution = 0.09, 34 clusters. [10 iterations]
2025-08-07 09:57:35 PM : At resolution = 0.11, 39 clusters. [12 iterations]
2025-08-07 10:37:39 PM : At resolution = 0.13, 41 clusters. [14 iterations]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions