ITRAP implementation by ArcaneEmergence · Pull Request #44 · SchubertLab/DextraDemixer

ArcaneEmergence · 2025-02-14T12:32:50Z

Resolves #3

Re-implementation of ITRAP, adapted to our data format and scenario. Did not (yet) implement filters based on other information beyond UMI count, as data availability and nomenclature can vary heavily between datasets.

Short summary:

ITRAP defines significant clonotype specificity by using a Wilcoxon test on the most and second most abundant epitopes (UMI counts). If the p-value < 0.05 and the clonotype contains more than 10 cells, the expected target gets assigned to the clonotype, else the clonotype is not considered for the following threshold search.
Each cell (from significant clonotypes) gets its specificity assigned individually to the most abundant UMI count. Now using the cell's assigned specificity and significant clonotype's specificity, an accuracy is calculated.
Ideal UMI and UMI ratio thresholds are searched. The thresholds should filter out noisy cells and the accuracy gets calculated on the retained ones. A grid search is used to find the thresholds optimizing a weighted average between accuracy and retained ratio.

To adapt to our case, I used the UMI count between epitope and negative control, and assigned filtered out cells as negative.

Alternatively, we can also assign specificity on a clonotype level, using the Wilcoxon test, though this is not the original ITRAP framework.

b-schubert · 2025-02-18T13:11:19Z

+    __name = "ITRAP"
+    __version = "0.0.1"
+
+    def __init__(self, umi_cols=None, umi_count_TRA=None, umi_count_TRB=None, filters=None):


would suggest moving umi_params to preprocess_data as it is data set specific and won't be necessary until preprocessing. Filter you can leave as it affects algo logic.

b-schubert · 2025-02-18T13:14:46Z

+    def __init__(self, umi_cols=None, umi_count_TRA=None, umi_count_TRB=None, filters=None):
+        """
+        Args:
+            umi_cols: List of columns containing UMI counts for pMHCs (default set to ['neg_control', 'pmhc1'])


this variable is already covered in preprocess_data under pmhc_key. I would suggest to just overload the param to accept also an interable of pmhc_keys.

b-schubert · 2025-02-18T13:16:36Z

+        """
+        Args:
+            umi_cols: List of columns containing UMI counts for pMHCs (default set to ['neg_control', 'pmhc1'])
+            umi_count_TRA: List of columns containing UMI counts for TRA (default: None)


could you stick to the convention to call fields in Mudata.X, var, obsm = xx_key

b-schubert · 2025-02-18T13:18:49Z

+        for col in self.umi_cols_mhc:
+            data[col] = mdata['gex'][:, col].X.toarray().reshape(-1)
+
+        def calc_delta(x):


move the internal helper function to function first line of method declaration

b-schubert · 2025-02-18T13:32:59Z

+        self.idx_to_specificity = {i: s for i, s in enumerate(self.umi_cols_mhc)}
+
+        data = mdata['airr'].obs.copy()
+        for col in self.umi_cols_mhc:


why not utilze x = gex[:, pmhc_key].X.toarray().reshape((N,))

and store that X in data.X or obsm? so you don't need to loop through the list of pMHCs and properly use the AnnData structure.

b-schubert · 2025-02-18T13:36:03Z

+        self.specificity_to_idx = {s: i for i, s in enumerate(self.umi_cols_mhc)}
+        self.idx_to_specificity = {i: s for i, s in enumerate(self.umi_cols_mhc)}
+
+        data = mdata['airr'].obs.copy()


This can be expensive depending on what as been done to the MuData object - e.g. could store UMAP coords, TCR Similarities, PCA embeddings and more.

Why not just create an internal empty Anndata to store your algorithm-specific infos and extract only the relevant info from the appropriate fields of the input MuData object.

b-schubert · 2025-02-18T13:44:44Z

+        self.data['assignment_before_filtering'] = self.data['assignment'].copy()
+        self.data.loc[~filters, 'assignment'] = 0
+
+        return self.data['assignment'].values.astype(int), self.data['assignment'].values.astype(float)


why do you return twice the assignment?
Def not a clean solution. I understand why you did what you did. But I'd say if the current interface does not fit, we need to abstract that interface further and perhaps create a more flexible interface or a super-interface for threshold-based models and a child interface for probabilistic models that inherits the super interface

And shouldn't it be reverse ordered (first float, then int) according to your definition in the docs?

b-schubert · 2025-02-18T13:49:16Z

+        if 'matching_HLA' in self.filters:
+            raise NotImplementedError("Matching HLA filter is not implemented yet.")
+
+        # Filter 4: Complete TCRs


TCR-related QC (filter 4 and 6) is available through scirpy.tl.chain_qc()

b-schubert · 2025-02-18T13:53:50Z

+                    filters &= data[k] >= thr
+
+        # TODO Other filters are not implemented yet, only makes sense once we have the respective data
+        # Filter 2: Hashing singlets


would remove demultiplexing is a preprocessing step that can require additional tools so out-of-scope here?

b-schubert · 2025-02-18T13:57:10Z

+            raise NotImplementedError("Complete TCRs filter is not implemented yet.")
+
+        # Filter 5: Specificity multiplets
+        if 'specificity_multiplets' in self.filters:


Should be implementable. of course, makes only sense if multiple dextramer were tested. But a vectorized implementation should take care of that edge case as well.

irene-bonapa · 2026-03-31T09:58:20Z

Ready for review.

Added additional filters
Removed ApMHCDeconvolution inheritance since the framework does not really fit
Added compatibility with adata
Fixed typos in umi_count_TRA/TRB, so that this threshold can be also optimised
Incorporated Benni's comments

ArcaneEmergence added 3 commits February 13, 2025 20:48

init ITRAP

2e4b156

refactor

7de0dc7

refactor: remove debugging block

9a0795b

ArcaneEmergence requested review from b-schubert and drEast February 14, 2025 12:32

b-schubert reviewed Feb 18, 2025

View reviewed changes

b-schubert and others added 4 commits July 30, 2025 17:20

- fixed hard coded gex_key and ir_key in process_model_data

8de2726

at least 2 columns for TCR chain optimization + TCRA/TCRB typo

6460587

clean-up pipeline

79c06d7

add additional filters

628411e

irene-bonapa requested review from b-schubert and removed request for drEast March 31, 2026 09:51

irene-bonapa assigned ArcaneEmergence and irene-bonapa Mar 31, 2026

update test

988dbc5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ITRAP implementation#44

ITRAP implementation#44
ArcaneEmergence wants to merge 8 commits into
mainfrom
feature/itrap

ArcaneEmergence commented Feb 14, 2025

Uh oh!

b-schubert Feb 18, 2025

Uh oh!

b-schubert Feb 18, 2025

Uh oh!

b-schubert Feb 18, 2025

Uh oh!

b-schubert Feb 18, 2025

Uh oh!

b-schubert Feb 18, 2025

Uh oh!

b-schubert Feb 18, 2025

Uh oh!

b-schubert Feb 18, 2025

Uh oh!

b-schubert Feb 18, 2025

Uh oh!

b-schubert Feb 18, 2025

Uh oh!

b-schubert Feb 18, 2025

Uh oh!

irene-bonapa commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

ArcaneEmergence commented Feb 14, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

irene-bonapa commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants