Hello dev team,
Due to memory limits which I have been encountering when trying to apply CHOIR to a particularly large dataset, I have been experimenting with the sample_max and max_clusters parameters. However, setting max_clusters to any value other than "auto" causes an error as follows:
- Line 1296 of buildTree.R sets
subtree_sizes to NULL: "subtree_sizes" = if(max_clusters == "auto", c(n_cells, subtree_sizes), NULL)
- Line 409 of pruneTree.R does the following:
subtree_names_filtered <- subtree_names[subtree_sizes > 3]. This causes subtree_names_filtered to be character(0).
- Line 411 of pruneTree.R sets
n_subtrees_filtered <- length(subtree_names_filtered), i.e. n_subtrees_filtered <- 0
- Lines 512-516 of pruneTree.R: if
buildTree_parameters[["subtree_reductions"]] == TRUE, then n_input_matrices <- n_subtrees_filtered i.e. n_input_matrices is 0.
- On line 523 of pruneTree.R, we then have the following for-loop:
for (subtree in 1:n_input_matrices). In the second iteration of the loop, subtree = 0, which leads to the following error on line 584:
Error in input_matrices[[subtree]] <- input_matrix :
attempt to select less than one element in integerOneIndex
On another note, I am applying CHOIR to flow cytometry data, which is a big contributor to the memory and runtime issues I am experiencing, and is why I am playing around with these parameters. If anyone would be willing to discuss with me the implications of applying this method to cytometry data, that would be wonderful. I have seen that this type of data was not discussed in your paper, and I assume that this is because CHOIR is optimised to deal with far higher dimensionality (I only have 27 variables) and fewer cells (some of my datasets contain up to 6.5 million cells). I would love to hear if the potential use of CHOIR for cytometry data was ever considered, and what other challenges you would expect in this scenario.
Thank you for your time!
Hello dev team,
Due to memory limits which I have been encountering when trying to apply CHOIR to a particularly large dataset, I have been experimenting with the
sample_maxandmax_clustersparameters. However, settingmax_clustersto any value other than"auto"causes an error as follows:subtree_sizestoNULL:"subtree_sizes" =if(max_clusters == "auto", c(n_cells, subtree_sizes), NULL)subtree_names_filtered <- subtree_names[subtree_sizes > 3]. This causessubtree_names_filteredto becharacter(0).n_subtrees_filtered <- length(subtree_names_filtered), i.e.n_subtrees_filtered <- 0buildTree_parameters[["subtree_reductions"]] == TRUE, thenn_input_matrices <- n_subtrees_filteredi.e.n_input_matricesis 0.for (subtree in 1:n_input_matrices). In the second iteration of the loop,subtree= 0, which leads to the following error on line 584:Error in input_matrices[[subtree]] <- input_matrix :attempt to select less than one element in integerOneIndexOn another note, I am applying CHOIR to flow cytometry data, which is a big contributor to the memory and runtime issues I am experiencing, and is why I am playing around with these parameters. If anyone would be willing to discuss with me the implications of applying this method to cytometry data, that would be wonderful. I have seen that this type of data was not discussed in your paper, and I assume that this is because CHOIR is optimised to deal with far higher dimensionality (I only have 27 variables) and fewer cells (some of my datasets contain up to 6.5 million cells). I would love to hear if the potential use of CHOIR for cytometry data was ever considered, and what other challenges you would expect in this scenario.
Thank you for your time!