fix: change scorefiles to queue channel to reduce memory usage by katgorski · Pull Request #488 · PGScatalog/pgsc_calc

katgorski · 2026-04-03T14:27:48Z

Fix for #475

Splits scores into a queue channel after the DOWNLOAD_SCOREFILES process so each score file is formatted individually in the downstream analyses, without modifying match processes. Report process and template also modified in order to handle multiple json score metadata files instead of a single json file. Subtle changes in handling chain files to accommodate the use of a queue channel as input into FORMAT_SCOREFILES.

Aimed to only make workflow changes and not modify any processes; will likely look at the pygscatalog repo later to check out memory usage there.

Draft because still in the process of testing things thoroughly, but if there's anything major I missed let me know so I can modify. Checking mechanics locally things seem to be fine, but the tests already present in the repo all run off their own little nf scripts; browsing through there isn't something that will easily confirm the behavior of the changes I made since it's mostly workflow related. Also need a test set to test liftover, modified that channel a bit in order to properly pass in the chain files multiple times.

copilot's summary below:

Scorefile and log file handling improvements:

Changed the scorefile input in INPUT_CHECK to a queue channel for better compatibility with Nextflow's channel operations, and updated downstream usage accordingly. [1] [2] [3]
Updated the SCORE_REPORT process to accept log_scorefiles as a separate input, ensuring that scorefile metadata is passed explicitly and consistently. [1] [2]
Modified the REPORT workflow to collect log_scorefiles into a channel before passing to SCORE_REPORT, and removed an unnecessary combine operation. [1] [2]

Scorefile metadata loading:

Updated the report generation script (report.qmd) to load all JSON files in the working directory, rather than relying on a single path from parameters, allowing for more flexible metadata aggregation.

Workflow input and channel management:

Improved the handling of optional chain files and scorefile flattening in the main PGSCCALC workflow to ensure correct input types and avoid issues with empty channels.

katgorski added 4 commits March 31, 2026 15:39

flatten scorefile download channel

69a3e40

rm extra script declaration

458185f

alter downstream handling of scorefiles for queue channel structure

6ca8109

update report generation to handle multiple json

3c1463e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: change scorefiles to queue channel to reduce memory usage#488

fix: change scorefiles to queue channel to reduce memory usage#488
katgorski wants to merge 4 commits into
PGScatalog:mainfrom
katgorski:change-scorefiles-to-queue-channel

katgorski commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

katgorski commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant