Skip to content

Releases: sanjaysgk/ipg

1.0.0dev — first development snapshot

09 Apr 04:57

Choose a tag to compare

Pre-release

First tagged development snapshot of the nf-core port of the 31-step IPG cryptic peptide pipeline (Scull et al., Mol Cell Proteomics 2021).

Note

This is a pre-release. The pipeline is not yet feature-complete and the test profile only exercises a chr22 subset of one D100-liver sample. The first stable 1.0.0 release will be cut after end-to-end validation against the full ATLANTIS cohort.

Pipeline overview

The 31 legacy bash steps from Select_steps_D122_*.sh are grouped into six typed nf-core subworkflows:

Subworkflow Legacy steps Tools
align_qc 1–3 STAR two-pass + samtools sort/index + RSeQC infer_experiment.py
transcript_assembly 4–5 StringTie + gffcompare (-R -V -C)
bam_prep 6–12 GATK4 RNA-seq best practices: FastqToSam → MergeBamAlignment → MarkDuplicates → SplitNCigarReads → ValidateSamFile
bqsr 13–16 GATK4 BaseRecalibrator (×2) + ApplyBQSR + AnalyzeCovariates
mutect_calling 17–23 Mutect2 (tumour-only) + LearnReadOrientation + GetPileupSummaries + CalculateContamination + FilterMutectCalls + SelectVariants + curate_vcf
db_construct 24–31 IndexFeatureFile + FastaAlternateReferenceMaker + revert_headers + gff3sort + alt_liftover + gffread + triple_translate + squish

The full diagram (with the conditional fan-out gated by --include_variant_peptides) is in README.md.

Quick start

```bash
git clone https://github.com/sanjaysgk/ipg.git
cd ipg
pixi install
pixi run bash bin/build_test_bundle.sh # one-time chr22 test bundle build
pixi run nextflow run . -profile test,pixi --outdir results
```

Or pull directly via Nextflow at this revision:

```bash
nextflow run sanjaysgk/ipg -r 1.0.0dev -profile test,docker --outdir results
```

Five run profiles ship: `test`, `pixi` (no containers), `singularity`, `docker`, `monash` (M3 SLURM, `comp` partition, `xy86` project).

What is in this snapshot

  • 23 nf-core/modules pinned via `modules.json` (STAR, samtools, RSeQC, StringTie, gffcompare, gffread, FastQC, MultiQC, 15 GATK4 tools).
  • 8 local modules under `modules/local/` for IPG-specific tools and GATK4 wrappers not yet upstream in nf-core/modules: `curate_vcf`, `revert_headers`, `alt_liftover`, `triple_translate`, `squish`, `gff3sort`, `gatk4_validatesamfile`, `gatk4_fastaalternatereferencemaker`.
  • `containers/ipg-tools/` reproducible Docker build with the five custom IPG C sources bundled in `containers/ipg-tools/src/`. Two of those sources carry locally-developed production fixes:
    • `curate_vcf.c` (V2): larger field buffers (chrom 200, ref/alt 4096, filter 1024), dynamic INFO field allocation, explicit `free()` cleanup.
    • `revert_headers.c`: optional 3-arg form for the output FASTA prefix (writes `.fasta` directly instead of the legacy hardcoded `tmpc.fasta`).
      Both improvements have been pushed upstream to `sanjaysgk/immunopeptidogenomics@a09a74c` for separate citeable provenance.
  • `pixi.lock` committing a bit-for-bit reproducible toolchain (Nextflow 25.10.4, GATK4 4.6.2, STAR 2.7.11b, samtools 1.23.1, StringTie 3.0.0, gffcompare 0.12.10, OpenJDK 17 — full list in `CHANGELOG.md`).
  • `--include_variant_peptides` parameter (default `false`) to optionally fold alt-reference variant-derived peptides into the cryptic peptide DB. Default matches the legacy Scull 2021 / D122_Lung run, verified empirically against the legacy `squish.log`.
  • `bin/build_test_bundle.sh` reproducible chr22 test bundle builder (idempotent, env-var-overridable).
  • nf-test harness with 8 snapshot-stable per-module stub tests.
  • GitHub Actions workflows: `ci.yml` parse-check matrix (Nextflow 24.04.2 + 25.10.4 × docker + singularity), and `build-ipg-tools.yml` that publishes `ghcr.io/sanjaysgk/ipg-tools` on push to `main` and on tags matching `ipg-tools-v*`.
  • Mermaid pipeline diagram and a rewritten `README.md`.
  • `CITATIONS.md` with the full reference list for every tool in the pipeline.

Variant calling caveat (please read before using)

`--include_variant_peptides` does NOT switch the variant caller to matched tumour-normal mode. Variant calling is always performed in tumour-only Mutect2 mode against a gnomAD-style germline allele-frequency database. This pipeline does not support matched tumour-normal calling. The flag controls only whether the discovered variants get folded into the final cryptic peptide DB. Set to `true` only when the sample is expected to harbour biologically meaningful somatic variants (e.g. tumour tissue, hypermutated samples, MMR-deficient cell lines). Leave at `false` (default) for normal tissue, cell lines, or any sample where variant peptides would mostly add noise.

Known limitations (the gap to `1.0.0`)

  • The chr22 test bundle is built locally on Monash M3 scratch and is not accessible from GitHub Actions runners. CI runs only `nextflow config` parse checks; the full end-to-end test must be run locally on Monash.
  • The pipeline has not yet been validated end-to-end against a full real sample. This is the next milestone toward `1.0.0`.
  • The `ghcr.io/sanjaysgk/ipg-tools` container image referenced by the five IPG-tool local modules has not been published yet. Pushing an `ipg-tools-v0.1.0` git tag will trigger the GHA workflow that publishes the first image. Until then only the `pixi` profile (which uses tools from the local PATH) can run end-to-end without modification.
  • `tests/default.nf.test` pipeline-level integration test is not yet wired up.
  • `nf-core pipelines lint` is configured with several intentional exclusions documented inline in `.nf-core.yml`.

Authors

Sanjay SG Krishna pipeline port Li Lab, Monash University
Kate Scull original IPG method, custom C tools Purcell Lab, Monash University
Chen Li supervision Li Lab, Monash University
Anthony W. Purcell supervision Purcell Lab, Monash University

Citation

If you use this version, please cite the original method paper:

Scull KE, Pandey K, Ramarathinam SH, Purcell AW.
Immunopeptidogenomics: harnessing RNA-seq to illuminate the dark immunopeptidome.
Mol Cell Proteomics. 2021;20:100143.
doi.org/10.1016/j.mcpro.2021.100143

Full reference list for every tool used: `CITATIONS.md`
Full release notes: `CHANGELOG.md`

License

MIT