release(v0.1.0): base IPG — db_construct + post_ms validated on real data by sanjaysgk · Pull Request #6 · sanjaysgk/ipg

sanjaysgk · 2026-04-14T06:18:36Z

Summary

First production release of sanjaysgk/ipg. Covers the base IPG functionality — end-to-end RNA-seq → cryptic peptide FASTA (--step db_construct) plus the 2-phase db_compare + origins analysis after a MS search (--step post_ms).

Validation status

Step	Validation	Evidence
`db_construct`	✅ real data	D106_liver SLURM 54810402 completed; D122_lung running now (54827535)
`post_ms`	✅ real data	D122_Liver DB_COMPARE Phase 1 COMPLETED; ORIGINS now builds via bin/build_ipg_tools.sh
`ms_search`	⚠️ experimental — DSL parses but not yet run on real mzMLs	Code merged for forward compatibility, real validation deferred to dev/test-suite
`immunoinformatics`	⚠️ experimental	Same

Known issues (pre-existing, not regressions)

nf-core linting check has been failing on legacy formatting inherited from earlier main. Not a regression.
Container ghcr.io/sanjaysgk/ipg-tools:0.2.0 doesn't exist publicly → use bin/build_ipg_tools.sh to compile C tools into pixi env first.

After merge

Tag v0.1.0-base on the merge commit
Open dev/test-suite branch (plan in memory/project_test_suite_plan.md) for ms_search validation

Test plan

D106_liver db_construct completed end-to-end
D122_Liver post_ms Phase 1 completed
D122_lung db_construct running now
Synthetic post_ms fixture passes DB_COMPARE + ORIGINS_SIMPLE + DB_COMPARE_PHASE2
ms_search real-data validation (deferred)

Signed-off-by: sanjaysgk <44039457+sanjaysgk@users.noreply.github.com>

Add the ms_search pipeline step with three new local modules: - PREPARE_FASTA: generates target-decoy FASTA if decoys absent - MSFRAGGER: runs MSFragger database search (user provides JAR) - MOKAPOT: semi-supervised FDR control on search engine PIN output New subworkflow MS_SEARCH chains: PREPARE_FASTA → MSFRAGGER → MOKAPOT Also adds: - assets/ms_search_params/: MSFragger parameter templates for all instrument/mod combinations (orbitrap/timsTOF × mod/nomod/TMT/mhcii) - assets/schema_ms_input.json: MS data samplesheet validation schema - conf/modules/ms_search.config: publishDir routing - bin/prepare_fasta.py: standalone CLI extracted from Willems core.py New params: --ms_input, --search_fasta, --engines, --msfragger_jar, --msfragger_mem, --instrument, --mod_type, --peptide_length, --peaks_psm_csv Sprint 2 will add Comet, Sage, and CONVERT_MZML modules. Signed-off-by: sanjaysgk <44039457+sanjaysgk@users.noreply.github.com>

Schema now validates the comma-list against msfragger|comet|sage via a pattern regex, so invalid engines fail at validation before any process launches. Updates all four call sites: params default, schema, workflow entry, and the utils_nfcore guard.

Wraps comet-ms bioconda binary. Consumes calibrated mzMLs from MSFragger (or raw mzMLs if MSFragger not in --ms_engines) and emits per-run PIN files for mokapot. Params path passed via -P so the FASTA inside the param file is ignored in favour of -D.

Wraps sage-proteomics. Emits a single combined PIN for all mzMLs in one shot (Sage differs from Comet here). When MSFragger's search_log.txt is staged alongside, Sage inherits the calibrated fragment tolerance and topN peaks — mirrors core.py run_Sage L516-537.

Per-mzML pymzml pass that emits MGF + scans.pkl + index2scan.pkl. Extracted from immunopeptidomics core.py read_mzML L679 and write_MGF L641. scans.pkl is keyed run_scan and carries precursor mz, charge, RT, and ion mobility for downstream MS2Rescore; index2scan.pkl is the spectrum-index→scan mapping PEAKS integration will need in Sprint 4.

mokapot writes both *.mokapot.psms.txt (target) and *.mokapot.decoy.psms.txt (decoy) with overlapping globs, so the Sprint 1 psms output was non-deterministic when --keep_decoys was set. Rename to *.target.psms.txt and *.decoy.psms.txt after mokapot finishes so MS2Rescore prep can pick them up separately. No downstream consumer of the old names exists yet.

Wraps ms2rescore CLI per engine. The prep script extracts per-engine SpecId parsing (MSFragger / Comet / Sage / PEAKS all format it differently) from core.py rescore_* L951-1117, merges mokapot target + decoy PSMs with the combined PIN's percolator features, and pulls per-scan precursor info from scans.pkl. Output TSV is the exact shape ms2rescore expects with prefixed rescoring: feature columns.

Merges rescored PSMs across engines at 1% PSM- and peptide-level FDR, separates chimeric spectra (scans assigned to >1 peptide) into their own audit file, and emits an integrated peptide table with per-run PSM counts. Mirrors core.py read_results L1451 and read_psms L1313 but is self-contained: takes engine_name=path pairs on the CLI so the Nextflow module does not need to know engine count at compile time. Protein info (gene, species, description) is parsed from the search FASTA headers assuming UniProt-style GN= / OS= fields.

Copied verbatim from immunopeptidomics/external_tools/{Comet,Sage,ms2rescore}/params/ so every {instrument}_{mod_type} combination the ms_search subworkflow references already exists on disk. 11 files per engine covering orbitrap/timsTOF × mod/nomod/TMT10/TMT16/mhcii/lowres.

Expands the subworkflow from Sprint 1's single-engine flow to the full open-source search pipeline: PREPARE_FASTA → MSFRAGGER → calibrated mzML + PIN → COMET + SAGE run in parallel on the calibrated mzMLs → MOKAPOT aliased per-engine (three independent FDR instances) → CONVERT_MZML fans out over mzMLs, grouped per sample for MGF + scans → MS2RESCORE aliased per-engine, rescores each mokapot output → INTEGRATE_ENGINES merges to a single unified peptides/PSMs table Engine selection is gated on --ms_engines everywhere; when MSFragger is not selected, Comet/Sage fall back to the raw mzML inputs and Sage skips the calibrated-settings inheritance path. INTEGRATE receives engine-name/TSV pairs via groupTuple so it does not need to know which engines ran at compile time.

Same reference bundle + GATK resources as params_D122_liver.yaml; only the input samplesheet and outdir differ. Used for the ongoing dev/cryptic-port validation run on xy86.

Schema pattern now allows peaks alongside msfragger/comet/sage. Validator rejects a run that lists peaks without also supplying --peaks_psm_csv. Adds --peaks_min_match_fraction knob (default 0.98) that gates how many PEAKS rows must resolve to real scan numbers before conversion proceeds.

Converts a PEAKS Studio db.psms.csv export into a PIN that the existing MOKAPOT module consumes unchanged. PEAKS reports spectrum *indices*, so the script resolves them against one or more index2scan.pkl pickles from CONVERT_MZML and aborts if fewer than peaks_min_match_fraction of rows line up — that catches the common 'PEAKS searched a different mzML' mistake early. Extracted from core.py run_PEAKS L555.

Fourth engine path: CONVERT_PEAKS (gated on params.peaks_psm_csv) → MOKAPOT_PEAKS → MS2RESCORE_PEAKS → INTEGRATE_ENGINES. CONVERT_PEAKS consumes the index2scan pickles already emitted by CONVERT_MZML per sample, so no new scan parsing is needed. MS2RESCORE is aliased a fourth time; INTEGRATE picks PEAKS up automatically through the same groupTuple channel as the other engines.

Introduces --run_netmhcpan, --run_netmhciipan, --run_gibbscluster, --run_flashlfq, --run_blastp_host boolean gates alongside the user-supplied tool paths (--netmhcpan_path, --netmhciipan_path, --gibbscluster_path), --hla allele list, --blast_db prefix, and --host_species. Each downstream tool is individually gated so users only run what they're licensed or configured for. Schema adds a new immunoinformatics_options group referenced from allOf.

Wrap the academic-licensed netMHCpan-4.1 / netMHCIIpan-4.3 binaries supplied by the user. Each module filters the integrated peptides table to the relevant length range (8-12 for class I, 13-18 for class II), calls the binary, and pipes stdout through parse_netmhcpan.py which extracts the PEPLIST / Sequence result rows and keeps the best ranked allele per peptide. Extracted from core.py netMHCpan() L1802 and get_best_binder() L1780.

Runs the user-supplied GibbsCluster-2.0e_SA.pl on immunopeptides and picks the winning cluster count by the largest KLD sum across rows of gibbs.KLDvsClusters.tab. parse_gibbs.py then reads the matching res/gibbs.<N>g.out file and emits a peptide→cluster mapping. Class-I length sets (max < 13) get the -C / -D 4 / -I 1 flags as in core.py run_Gibbs() L2041.

Quantifies peptides across runs using bioconda's flashlfq=2.1.4 so no dotnet/user binary is needed. prepare_flashlfq_input.py melts the PSMs_run_* columns from integrated_peptides.tsv into FlashLFQ's long idt format and de-duplicates on (Full Sequence, charge, File Name). Match-between-runs is toggled on automatically when >1 MS file is present. Derived from core.py run_FlashLFQ() L1119 but fed from the unified integrated peptide table rather than per-engine mokapot outputs.

Filters the integrated peptides to non-host rows (species column does not contain --host_species and peptide sequence doesn't say 'contaminant'), writes a FASTA with I→L substitution, and runs blastp-short against the user-supplied --blast_db. Top hit per peptide is merged back into the peptide table as BLASTP_ident%, BLASTP_match, BLASTP_matchedSeq columns; 100% identity rows have the host species appended for consistent downstream filtering. Mirrors core.py run_BLASTP() L2181.

New IMMUNOINFORMATICS subworkflow gates the five downstream modules on individual --run_* flags so a user can run, say, only netMHCpan + Gibbs without touching BLAST or FlashLFQ. Errors early if a tool is requested without the licensed binary or the HLA list. The main workflow invokes it right after MS_SEARCH when the user has asked for at least one tool — otherwise it's skipped entirely and the ms_search step ends at the integrated peptide table like before.

Nextflow processes with conditional/optional inputs need a concrete file path to stage; zero-byte NO_FILE is the nf-core convention for signalling 'no file supplied' from a subworkflow to a module. Consumed first by IMMUNOINFORMATICS_REPORT in Sprint 6 — SAGE already referenced it but was tolerant of the missing file; this makes the sentinel explicit.

Self-contained per-sample HTML report with embedded PNG plots. Accepts six optional inputs (integrated peptides + netMHCpan/IIpan best-binder tables + Gibbs clusters + FlashLFQ quant + blastp-annotated peptides), each padded with assets/NO_FILE when the matching --run_* gate was off upstream. Sections degrade gracefully: missing tables simply omit the section rather than failing the run. Plot code is the essential subset of core.py histogram_plotter L1615, id_per_run L1718, netMHCpan_overview L1881, and gibbs_plot L2084; sequence logos use logomaker when available.

After every gated tool has had its chance to populate its output channel, fan them all into IMMUNOINFORMATICS_REPORT keyed on meta, padding any absent channel with assets/NO_FILE. The module itself tests each input for the NO_FILE name and skips the corresponding --flag to the python script, so users who only enable netMHCpan still get an HTML page.

Adds a dedicated 'MS search' section to docs/usage.md covering the samplesheet shape, engine selection, licensed-tool placement, and a worked invocation; lists the five optional immunoinformatics gates in a single table. README.md gets a matching paragraph pointing at the new step. conf/test_ms_search.config is a new stub-only profile that exercises the ms_search subworkflow wiring without requiring a real mzML or licensed MSFragger JAR — register it in nextflow.config alongside the existing test / test_full profiles.

sanjaysgk/ipg deliberately does not ship licensed binaries (MSFragger, netMHCpan, netMHCIIpan, GibbsCluster) or user-specific blast DBs. docs/external_tools.md spells out what bioconda already installs, what the user must supply, where to put it on M3, and a full params YAML example. bin/check_external_tools.sh is a pre-flight validator that reads the same params YAML and verifies each configured path is readable/executable (plus .pin/.phr/.psq siblings for blast DBs), so misconfigured runs fail instantly instead of two hours in.

feat(ms_search): add MS search + immunoinformatics steps (Sprints 2-7)

Two distinct fixes: 1. Vendored Comet/Sage/MS2Rescore param templates under assets/ms_search_params/ are upstream files copied verbatim from immunopeptidomics/external_tools/. Reformatting them would diverge from upstream and make future syncs painful — same logic as the existing /containers/ipg-tools/src/** carve-out. Add a matching editorconfig stanza to unset all rules under that path. 2. Three of my own .nf files had Nextflow continuation indents that weren't multiples of 4 (immunoinformatics_report tuple input, immunoinformatics subworkflow comment block, and python heredocs in blastp_host). Tightened to satisfy the linter without changing semantics.

ci: fix pre-commit editorconfig violations

Three pieces: 1. immunoinformatics_report/meta.yml had a missing space after 'netmhciipan:' that broke YAML compact-mapping syntax. 2. Add assets/ms_search_params/ to .prettierignore so prettier leaves the vendored Comet/Sage/MS2Rescore param templates alone — same rationale as the .editorconfig carve-out. 3. Apply prettier auto-format to all module meta.yml / environment.yml and the two new docs files (whitespace-only changes; no semantic diff).

ci: round 2 lint — prettier auto-format + ignore vendored params + yaml fix

sourcery-ai

Sorry @sanjaysgk, your pull request is larger than the review limit of 150000 diff characters

github-actions · 2026-04-14T06:20:48Z

`nf-core pipelines lint` overall result: Failed ❌

Posted for pipeline commit 85c9f7f

+| ✅ 179 tests passed       |+
#| ❔  12 tests were ignored |#
#| ❔   1 tests had warnings |#
!| ❗   3 tests had warnings |!
-| ❌  14 tests failed       |-

Details

❌ Test failures:

nextflow_config - Outdated lines for loading custom profiles found. File should contain:

// Load nf-core custom profiles from different institutions
includeConfig params.custom_config_base && (!System.getenv('NXF_OFFLINE') || !params.custom_config_base.startsWith('http')) ? "${params.custom_config_base}/nfcore_custom.config" : "/dev/null"

files_unchanged - .github/workflows/branch.yml does not match the template
files_unchanged - .github/workflows/linting.yml does not match the template
files_unchanged - assets/email_template.html does not match the template
files_unchanged - assets/email_template.txt does not match the template
files_unchanged - assets/sendmail_template.txt does not match the template
files_unchanged - docs/README.md does not match the template
files_unchanged - .gitignore does not match the template
files_unchanged - .prettierignore does not match the template
actions_awsfulltest - .github/workflows/awsfulltest.yml is not triggered correctly
template_strings - Found a Jinja template string in /home/runner/work/ipg/ipg/README.md L130: R{{"--include_variant_peptides
true?"}}
schema_params - Param test_bundle from nextflow config not found in nextflow_schema.json
multiqc_config - assets/multiqc_config.yml does not meet requirements: Section sanjaysgk-ipg-summary missing in report_section_order
multiqc_config - assets/multiqc_config.yml does not contain a matching 'report_comment'.
The expected comment is:
This report has been generated by the <a href="https://github.com/sanjaysgk/ipg/tree/dev" target="_blank">sanjaysgk/ipg</a> analysis pipeline. For information about how to interpret these results, please see the <a href="https://nf-co.re/ipg/dev/docs/output" target="_blank">documentation</a>.
The current comment is:
This report has been generated by the <a href="https://github.com/sanjaysgk/ipg/tree/dev" target="_blank">sanjaysgk/ipg</a> analysis pipeline. For information about how to interpret these results, please see the <a href="https://github.com/sanjaysgk/ipg/dev/docs/output" target="_blank">documentation</a>.

❗ Test warnings:

readme - README did not have a Nextflow minimum version badge.
readme - README did not have an nf-core template version badge.
schema_lint - Schema 'description' should be 'Immunopeptidogenomics — cryptic peptide database construction from RNA-seq.'
Found: 'Immunopeptidomics Cryptic peptide databse construction from RNAseq'

❔ Tests ignored:

files_exist - File is ignored: .github/workflows/nf-test.yml
files_exist - File is ignored: .github/actions/get-shards/action.yml
files_exist - File is ignored: .github/actions/nf-test/action.yml
files_exist - File is ignored: tests/default.nf.test
nextflow_config - Config variable ignored: manifest.version
files_unchanged - File ignored due to lint config: .gitattributes
files_unchanged - File ignored due to lint config: .prettierrc.yml
files_unchanged - File ignored due to lint config: .github/CONTRIBUTING.md
files_unchanged - File ignored due to lint config: .github/ISSUE_TEMPLATE/bug_report.yml
files_unchanged - File ignored due to lint config: .github/PULL_REQUEST_TEMPLATE.md
files_unchanged - File ignored due to lint config: .github/workflows/linting_comment.yml
actions_nf_test - '.github/workflows/nf-test.yml' not found

❔ Tests fixed:

rocrate_readme_sync - Mismatch fixed: RO-Crate description updated from README.md.

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-ipg_logo_light.png
files_exist - File found: conf/modules.config
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-ipg_logo_light.png
files_exist - File found: docs/images/nf-core-ipg_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: nf-test.config
files_exist - File found: main.nf
files_exist - File found: assets/multiqc_config.yml
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: conf/igenomes_ignored.config
files_exist - File found: .github/workflows/awstest.yml
files_exist - File found: .github/workflows/awsfulltest.yml
files_exist - File found: modules.json
files_exist - File found: ro-crate-metadata.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-ipg_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowIpg.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Found nf-schema plugin
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config variable (correctly) not found: params.max_cpus
nextflow_config - Config variable (correctly) not found: params.max_memory
nextflow_config - Config variable (correctly) not found: params.max_time
nextflow_config - Config variable (correctly) not found: params.validationFailUnrecognisedParams
nextflow_config - Config variable (correctly) not found: params.validationLenientMode
nextflow_config - Config variable (correctly) not found: params.validationSchemaIgnoreParams
nextflow_config - Config variable (correctly) not found: params.validationShowHiddenParams
nextflow_config - Config variable (correctly) not found: validation.failUnrecognisedParams
nextflow_config - Config variable (correctly) not found: validation.failUnrecognisedHeaders
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with sanjaysgk/
nextflow_config - Config variable manifest.homePage began with https://github.com/sanjaysgk/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 1.0.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.save_reference= false
nextflow_config - Config default value correct: params.include_variant_peptides= false
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.ms_engines= msfragger
nextflow_config - Config default value correct: params.msfragger_mem= 8
nextflow_config - Config default value correct: params.mod_type= mod
nextflow_config - Config default value correct: params.peptide_length= 9
nextflow_config - Config default value correct: params.peaks_min_match_fraction= 0.98
nextflow_config - Config default value correct: params.run_netmhcpan= false
nextflow_config - Config default value correct: params.run_netmhciipan= false
nextflow_config - Config default value correct: params.run_gibbscluster= false
nextflow_config - Config default value correct: params.run_flashlfq= false
nextflow_config - Config default value correct: params.run_blastp_host= false
nextflow_config - Config default value correct: params.gibbs_clusters= auto
nextflow_config - Config default value correct: params.host_species= HUMAN
nextflow_config - Config default value correct: params.step= db_construct
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/
nf_test_content - 'tests/nextflow.config' contains modules_testdata_base_path
nf_test_content - 'tests/nextflow.config' contains pipelines_testdata_base_path
nf_test_content - 'nf-test.config' sets a testsDir
nf_test_content - 'nf-test.config' sets a workDir
nf_test_content - 'nf-test.config' sets a configFile
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
actions_awstest - '.github/workflows/awstest.yml' is triggered correctly
actions_awsfulltest - .github/workflows/awsfulltest.yml does not use -profile test
readme - README Zenodo placeholder was replaced with DOI.
pipeline_todos - No TODO strings found
pipeline_if_empty_null - No ifEmpty(null) strings found
plugin_includes - No wrong validation plugin imports have been found
pipeline_name_conventions - Name adheres to nf-core convention
schema_lint - Schema lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: build-ipg-tools.yml
actions_schema_validation - Workflow validation passed: awsfulltest.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: template_version_comment.yml
actions_schema_validation - Workflow validation passed: awstest.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
multiqc_config - assets/multiqc_config.yml found and not ignored.
multiqc_config - assets/multiqc_config.yml contains report_section_order
multiqc_config - assets/multiqc_config.yml contains export_plots
multiqc_config - assets/multiqc_config.yml contains report_comment
multiqc_config - assets/multiqc_config.yml contains 'export_plots: true'.
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
local_component_structure - local subworkflows directory structure is correct 'subworkflows/local/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
modules_config - conf/modules.config found and not ignored.
modules_config - FASTQC found in conf/modules.config and Nextflow scripts.
modules_config - MULTIQC found in conf/modules.config and Nextflow scripts.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 3.5.2
rocrate_readme_sync - RO-Crate description matches the README.md.

Run details

nf-core/tools version 3.5.2
Run at 2026-04-15 01:13:02

…ists DB_COMPARE_PHASE1 was being invoked with bare `[]` for the two optional list inputs. Nextflow still stages a file in that case but gives it a name that doesn't match the module's 'NO_FILE' guard, so the phase2_args conditional fires and adds '-j -u ' to the R command — db_compare_v2.R then fails optparse with: error: flag "j" requires an argument Swap to the existing assets/NO_FILE sentinel so the module correctly skips those flags in phase 1. Surfaced by a real D122_Liver post_ms run on SLURM; same bug would affect every phase-1 invocation.

fix(post_ms): NO_FILE sentinel for phase-1 empty list inputs

Phase 1 passes NO_FILE for both discard_list and unconventional_list. Nextflow refuses to stage two input files with the same filename in the same work dir — errors with: input file name collision -- There are multiple input files for each of the following file names: NO_FILE Scope the two path inputs into separate staging subdirs (discard/ and unconv/) so the collision is impossible regardless of what the caller supplies. The downstream script contract is unchanged (both paths are still only used when discard_list.name != 'NO_FILE'). Second fix in a row for the D122_Liver post_ms run — first was the NO_FILE sentinel itself (PR #7).

Two pieces: 1. .gitignore — carve tests/data/ out of the existing 'data/' rule with a negation so test fixtures can be committed while real run outputs stay ignored. 2. tests/data/post_ms/ — minimum-viable synthetic fixture exercising the --step post_ms subworkflow (Phase 1 → ORIGINS simple → Phase 2 → ORIGINS full). Hand-built peptide CSVs with 5 cryptic-only rows, 3 shared across DBs, and 2 below-threshold noise rows — guarantees every downstream module sees non-empty input. Runs in ~30s, no licensed tools. README.md has the launch command and expected outputs. Complements the chr22 db_construct bundle and the pending HepG2 ms_search fixture per project_test_suite_plan.md (Path A).

fix(db_compare): stageAs for NO_FILE collision + synthetic post_ms test fixture

VennDiagram::venn.diagram with cat.prompts=TRUE prompts the user during layout calculation. In a non-interactive R session (any nf-core run), adjust.venn returns NA for max/min, then the script crashes with: Error in if (max.x - min.x >= max.y - min.y) { missing value where TRUE/FALSE needed Calls: venn.diagram -> <Anonymous> -> adjust.venn cat.prompts=TRUE is debugging mode — must be FALSE for headless runs. Bug surfaced by the synthetic post_ms test fixture; same call would fail any post_ms run on real data once the column-case + sentinel fixes land.

…O_FILE check Two related fixes for db_compare module: 1. Newer PEAKS exports use '-10LgP' (capital L), R's check.names converts to 'X.10LgP', dplyr::select(X.10lgP) misses → script crashes selecting columns. Pre-process headers with sed before R reads them. Operates on copies so staged inputs aren't mutated. 2. After PR #8 added stageAs: 'discard/*' to disambiguate the two NO_FILE inputs, the .name property started returning 'discard/NO_FILE' not 'NO_FILE' — breaking the phase-1/phase-2 gate. Switch to discard_list.size() > 0 (NO_FILE is zero bytes by design) so the check works regardless of stageAs path tricks. Both surfaced by the synthetic post_ms test fixture exercising the real D122_Liver path — every reported error now resolved end-to-end through DB_COMPARE_PHASE1.

Header has 20 fields, original data rows had 21 — extra empty field between Accession and Area columns. R read.csv aligned columns wrong (Mass values into X.10lgP slot, Length set to 0 for all rows), which emptied the post-filter peptide lists fed to venn.diagram and tripped the cat.prompts=TRUE bug. Drop one comma per row so PTM/AScore/Area align correctly.

fix(db_compare): three runtime bugs surfaced by post_ms test fixture

… env The Dockerfile at containers/ipg-tools/Dockerfile builds six C tools (curate_vcf, revert_headers, alt_liftover, triple_translate, squish, origins) for the singularity/docker container image. The pixi engine profile bypasses containers entirely, so these binaries need to exist in bin/ for any --profile pixi run to work. Other tools (squish, triple_translate, etc.) were already gitignored under bin/ but their build path wasn't documented anywhere — leaving new clones unable to run the pipeline in pixi mode without manual gcc invocations. This script: - Compiles all six tools into bin/ with the same flags as the Dockerfile - Gracefully overwrites pre-existing binaries - Is the single source of truth for the pixi build path Surfaced by the D122_Liver post_ms run hitting 'origins: command not found' after the singularity profile failed to pull the non-existent ghcr.io/sanjaysgk/ipg-tools:0.2.0 container. bin/origins added to .gitignore alongside the existing tool entries.

build: add bin/build_ipg_tools.sh — compiles origins + 5 kescull tools for pixi env

Same shape as params_D122_liver.yaml; only input samplesheet differs. Paired with launcher at /fs04/scratch2/xy86/sanjay/ipg-runs/D122_lung_full/run_db_construct.sh (not in repo — launcher lives alongside the run output tree).

assets: add D122_lung samplesheet and params

sanjaysgk added 30 commits April 13, 2026 17:22

style: prettier auto-format tables in usage.md and params YAML

8749508

Signed-off-by: sanjaysgk <44039457+sanjaysgk@users.noreply.github.com>

chore(gitignore): ignore sbatch driver logs and local stub-run artefacts

5e15ed6

modules/local: add comet

bfde07f

Wraps comet-ms bioconda binary. Consumes calibrated mzMLs from MSFragger (or raw mzMLs if MSFragger not in --ms_engines) and emits per-run PIN files for mokapot. Params path passed via -P so the FASTA inside the param file is ignored in favour of -D.

assets: add D106_liver params file for Monash M3 full-data run

0df3a8e

Same reference bundle + GATK resources as params_D122_liver.yaml; only the input samplesheet and outdir differ. Used for the ongoing dev/cryptic-port validation run on xy86.

Merge pull request #3 from sanjaysgk/dev/ms-search-immunoinformatics

a733c02

feat(ms_search): add MS search + immunoinformatics steps (Sprints 2-7)

Merge pull request #4 from sanjaysgk/dev/ms-search-immunoinformatics

5e20bab

ci: fix pre-commit editorconfig violations

sanjaysgk added 2 commits April 14, 2026 16:16

Merge pull request #5 from sanjaysgk/dev/ms-search-immunoinformatics

d087f81

ci: round 2 lint — prettier auto-format + ignore vendored params + yaml fix

sourcery-ai bot reviewed Apr 14, 2026

View reviewed changes

sanjaysgk added 13 commits April 15, 2026 10:16

Merge pull request #7 from sanjaysgk/dev/post-ms-phase1-fix

277e517

fix(post_ms): NO_FILE sentinel for phase-1 empty list inputs

Merge pull request #8 from sanjaysgk/dev/post-ms-test-data

3560697

fix(db_compare): stageAs for NO_FILE collision + synthetic post_ms test fixture

Merge pull request #9 from sanjaysgk/dev/db-compare-column-case

1b9be3c

fix(db_compare): three runtime bugs surfaced by post_ms test fixture

Merge pull request #10 from sanjaysgk/dev/origins-pixi-build

a7868e8

build: add bin/build_ipg_tools.sh — compiles origins + 5 kescull tools for pixi env

Merge pull request #11 from sanjaysgk/dev/d122-lung-params

85c9f7f

assets: add D122_lung samplesheet and params

sanjaysgk changed the title ~~Dev/cryptic port~~ release(v0.1.0): base IPG — db_construct + post_ms validated on real data Apr 15, 2026

sanjaysgk merged commit 739f280 into main Apr 15, 2026
15 of 18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release(v0.1.0): base IPG — db_construct + post_ms validated on real data#6

release(v0.1.0): base IPG — db_construct + post_ms validated on real data#6
sanjaysgk merged 45 commits intomainfrom
dev/cryptic-port

sanjaysgk commented Apr 14, 2026 •

edited

Loading

Uh oh!

sourcery-ai bot left a comment

Uh oh!

github-actions bot commented Apr 14, 2026 •

edited

Loading

❌ Test failures:

❗ Test warnings:

❔ Tests ignored:

❔ Tests fixed:

✅ Tests passed:

Run details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sanjaysgk commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation status

Known issues (pre-existing, not regressions)

After merge

Test plan

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

nf-core pipelines lint overall result: Failed ❌

❌ Test failures:

❗ Test warnings:

❔ Tests ignored:

❔ Tests fixed:

✅ Tests passed:

Run details

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sanjaysgk commented Apr 14, 2026 •

edited

Loading

github-actions bot commented Apr 14, 2026 •

edited

Loading

`nf-core pipelines lint` overall result: Failed ❌