fix: change scorefiles to queue channel to reduce memory usage#488
Draft
katgorski wants to merge 4 commits into
Draft
fix: change scorefiles to queue channel to reduce memory usage#488katgorski wants to merge 4 commits into
katgorski wants to merge 4 commits into
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix for #475
Splits scores into a queue channel after the
DOWNLOAD_SCOREFILESprocess so each score file is formatted individually in the downstream analyses, without modifying match processes. Report process and template also modified in order to handle multiple json score metadata files instead of a single json file. Subtle changes in handling chain files to accommodate the use of a queue channel as input intoFORMAT_SCOREFILES.Aimed to only make workflow changes and not modify any processes; will likely look at the pygscatalog repo later to check out memory usage there.
Draft because still in the process of testing things thoroughly, but if there's anything major I missed let me know so I can modify. Checking mechanics locally things seem to be fine, but the tests already present in the repo all run off their own little nf scripts; browsing through there isn't something that will easily confirm the behavior of the changes I made since it's mostly workflow related. Also need a test set to test liftover, modified that channel a bit in order to properly pass in the chain files multiple times.
copilot's summary below:
Scorefile and log file handling improvements:
scorefileinput inINPUT_CHECKto a queue channel for better compatibility with Nextflow's channel operations, and updated downstream usage accordingly. [1] [2] [3]SCORE_REPORTprocess to acceptlog_scorefilesas a separate input, ensuring that scorefile metadata is passed explicitly and consistently. [1] [2]REPORTworkflow to collectlog_scorefilesinto a channel before passing toSCORE_REPORT, and removed an unnecessary combine operation. [1] [2]Scorefile metadata loading:
report.qmd) to load all JSON files in the working directory, rather than relying on a single path from parameters, allowing for more flexible metadata aggregation.Workflow input and channel management:
PGSCCALCworkflow to ensure correct input types and avoid issues with empty channels.