Skip to content

ITRAP implementation#44

Open
ArcaneEmergence wants to merge 8 commits into
mainfrom
feature/itrap
Open

ITRAP implementation#44
ArcaneEmergence wants to merge 8 commits into
mainfrom
feature/itrap

Conversation

@ArcaneEmergence

Copy link
Copy Markdown
Collaborator

Resolves #3

Re-implementation of ITRAP, adapted to our data format and scenario. Did not (yet) implement filters based on other information beyond UMI count, as data availability and nomenclature can vary heavily between datasets.

Short summary:

  1. ITRAP defines significant clonotype specificity by using a Wilcoxon test on the most and second most abundant epitopes (UMI counts). If the p-value < 0.05 and the clonotype contains more than 10 cells, the expected target gets assigned to the clonotype, else the clonotype is not considered for the following threshold search.
  2. Each cell (from significant clonotypes) gets its specificity assigned individually to the most abundant UMI count. Now using the cell's assigned specificity and significant clonotype's specificity, an accuracy is calculated.
  3. Ideal UMI and UMI ratio thresholds are searched. The thresholds should filter out noisy cells and the accuracy gets calculated on the retained ones. A grid search is used to find the thresholds optimizing a weighted average between accuracy and retained ratio.

To adapt to our case, I used the UMI count between epitope and negative control, and assigned filtered out cells as negative.

Alternatively, we can also assign specificity on a clonotype level, using the Wilcoxon test, though this is not the original ITRAP framework.

Comment thread dextrademixer/model/ITRAP.py Outdated
__name = "ITRAP"
__version = "0.0.1"

def __init__(self, umi_cols=None, umi_count_TRA=None, umi_count_TRB=None, filters=None):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would suggest moving umi_params to preprocess_data as it is data set specific and won't be necessary until preprocessing. Filter you can leave as it affects algo logic.

Comment thread dextrademixer/model/ITRAP.py Outdated
def __init__(self, umi_cols=None, umi_count_TRA=None, umi_count_TRB=None, filters=None):
"""
Args:
umi_cols: List of columns containing UMI counts for pMHCs (default set to ['neg_control', 'pmhc1'])

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this variable is already covered in preprocess_data under pmhc_key. I would suggest to just overload the param to accept also an interable of pmhc_keys.

Comment thread dextrademixer/model/ITRAP.py Outdated
"""
Args:
umi_cols: List of columns containing UMI counts for pMHCs (default set to ['neg_control', 'pmhc1'])
umi_count_TRA: List of columns containing UMI counts for TRA (default: None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you stick to the convention to call fields in Mudata.X, var, obsm = xx_key

for col in self.umi_cols_mhc:
data[col] = mdata['gex'][:, col].X.toarray().reshape(-1)

def calc_delta(x):

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move the internal helper function to function first line of method declaration

Comment thread dextrademixer/model/ITRAP.py Outdated
self.idx_to_specificity = {i: s for i, s in enumerate(self.umi_cols_mhc)}

data = mdata['airr'].obs.copy()
for col in self.umi_cols_mhc:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not utilze x = gex[:, pmhc_key].X.toarray().reshape((N,))

and store that X in data.X or obsm? so you don't need to loop through the list of pMHCs and properly use the AnnData structure.

Comment thread dextrademixer/model/ITRAP.py Outdated
self.specificity_to_idx = {s: i for i, s in enumerate(self.umi_cols_mhc)}
self.idx_to_specificity = {i: s for i, s in enumerate(self.umi_cols_mhc)}

data = mdata['airr'].obs.copy()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be expensive depending on what as been done to the MuData object - e.g. could store UMAP coords, TCR Similarities, PCA embeddings and more.

Why not just create an internal empty Anndata to store your algorithm-specific infos and extract only the relevant info from the appropriate fields of the input MuData object.

Comment thread dextrademixer/model/ITRAP.py Outdated
self.data['assignment_before_filtering'] = self.data['assignment'].copy()
self.data.loc[~filters, 'assignment'] = 0

return self.data['assignment'].values.astype(int), self.data['assignment'].values.astype(float)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you return twice the assignment?
Def not a clean solution. I understand why you did what you did. But I'd say if the current interface does not fit, we need to abstract that interface further and perhaps create a more flexible interface or a super-interface for threshold-based models and a child interface for probabilistic models that inherits the super interface

And shouldn't it be reverse ordered (first float, then int) according to your definition in the docs?

if 'matching_HLA' in self.filters:
raise NotImplementedError("Matching HLA filter is not implemented yet.")

# Filter 4: Complete TCRs

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TCR-related QC (filter 4 and 6) is available through scirpy.tl.chain_qc()

filters &= data[k] >= thr

# TODO Other filters are not implemented yet, only makes sense once we have the respective data
# Filter 2: Hashing singlets

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would remove demultiplexing is a preprocessing step that can require additional tools so out-of-scope here?

raise NotImplementedError("Complete TCRs filter is not implemented yet.")

# Filter 5: Specificity multiplets
if 'specificity_multiplets' in self.filters:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be implementable. of course, makes only sense if multiple dextramer were tested. But a vectorized implementation should take care of that edge case as well.

@irene-bonapa irene-bonapa requested review from b-schubert and removed request for drEast March 31, 2026 09:51
@irene-bonapa

Copy link
Copy Markdown

Ready for review.

  • Added additional filters
  • Removed ApMHCDeconvolution inheritance since the framework does not really fit
  • Added compatibility with adata
  • Fixed typos in umi_count_TRA/TRB, so that this threshold can be also optimised
  • Incorporated Benni's comments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implementation of ATRAP approach

3 participants