Skip to content

remove spike via mapping#112

Open
Sam-Sims wants to merge 9 commits into
artic-network:mainfrom
Sam-Sims:feat/spike_removal
Open

remove spike via mapping#112
Sam-Sims wants to merge 9 commits into
artic-network:mainfrom
Sam-Sims:feat/spike_removal

Conversation

@Sam-Sims
Copy link
Copy Markdown
Collaborator

addresses #100

Initial stab at this, so things can change if needed.

Essentially adds a mapping step to the preprocess workflow that runs before anything else and removes any mapped reads to be stored as a *_spikes.fastq file (currently output to the classifications subfolder). This should also handle spike ins that contain multiple species by mapping to a concatenated reference of each in those cases. All of the unmapped reads continue through fastp, and then to the rest of Scylla.

Secondly the check_spike_status workflow has been changed so that it does not repeat the mapping. It still reports any detected spike ins from kraken, but the mapping stats stored in spike_count_summary.json now come from the new mapping step in the preprocess workflow.

This could be confusing as mapped_count now means reads removed, and classified_count means any reads kraken has classified as spike after the removal process (some reads might not be removed in some scenarios I guess) - so these fields names may benefit from being changed a bit.

Updated tests to reflect new bahaviour

@Sam-Sims Sam-Sims requested a review from BioWilko November 11, 2025 11:31
@Sam-Sims Sam-Sims marked this pull request as ready for review December 4, 2025 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant