remove spike via mapping by Sam-Sims · Pull Request #112 · artic-network/scylla

Sam-Sims · 2025-10-20T12:59:23Z

addresses #100

Initial stab at this, so things can change if needed.

Essentially adds a mapping step to the preprocess workflow that runs before anything else and removes any mapped reads to be stored as a *_spikes.fastq file (currently output to the classifications subfolder). This should also handle spike ins that contain multiple species by mapping to a concatenated reference of each in those cases. All of the unmapped reads continue through fastp, and then to the rest of Scylla.

Secondly the check_spike_status workflow has been changed so that it does not repeat the mapping. It still reports any detected spike ins from kraken, but the mapping stats stored in spike_count_summary.json now come from the new mapping step in the preprocess workflow.

This could be confusing as mapped_count now means reads removed, and classified_count means any reads kraken has classified as spike after the removal process (some reads might not be removed in some scenarios I guess) - so these fields names may benefit from being changed a bit.

Updated tests to reflect new bahaviour

adds a step to preprocess workflow that removes any spiked reads before running anything else. onyx percentages will now be percentages excluding spike

…_mapping_stats tuple

no longer semantically a "pass" or "fail" since reads should be removed

…en classifications

Sam-Sims added 9 commits October 20, 2025 13:10

feat: introduce spike removal via mapping

4d21277

adds a step to preprocess workflow that removes any spiked reads before running anything else. onyx percentages will now be percentages excluding spike

feat: update check_spike_status with new mapping workflow

11419f8

fix: adjust input channels in check_spike_ins to properly merge spike…

e505609

…_mapping_stats tuple

refactor: rename pass/fail to detected/absent

6b28cc0

no longer semantically a "pass" or "fail" since reads should be removed

tests(spike): update tests for new process

86ffd2d

tests: update test cases when mapping and have both kraken or no krak…

04c0e34

…en classifications

feat: disallow secondary alignments

0068c3f

tests: update snapshots

b214f76

revert: restore pass/fail terminology in spike report

bb23c7b

Sam-Sims mentioned this pull request Nov 5, 2025

new mSCAPE metadata fields for spike removal CLIMB-TRE/mscape-dipi-group#8

Open

4 tasks

Sam-Sims requested a review from BioWilko November 11, 2025 11:31

Sam-Sims marked this pull request as ready for review December 4, 2025 11:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove spike via mapping#112

remove spike via mapping#112
Sam-Sims wants to merge 9 commits into
artic-network:mainfrom
Sam-Sims:feat/spike_removal

Sam-Sims commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Sam-Sims commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant