Releases: sanjaysgk/ipg
1.0.0dev — first development snapshot
First tagged development snapshot of the nf-core port of the 31-step IPG cryptic peptide pipeline (Scull et al., Mol Cell Proteomics 2021).
Note
This is a pre-release. The pipeline is not yet feature-complete and the test profile only exercises a chr22 subset of one D100-liver sample. The first stable 1.0.0 release will be cut after end-to-end validation against the full ATLANTIS cohort.
Pipeline overview
The 31 legacy bash steps from Select_steps_D122_*.sh are grouped into six typed nf-core subworkflows:
| Subworkflow | Legacy steps | Tools |
|---|---|---|
align_qc |
1–3 | STAR two-pass + samtools sort/index + RSeQC infer_experiment.py |
transcript_assembly |
4–5 | StringTie + gffcompare (-R -V -C) |
bam_prep |
6–12 | GATK4 RNA-seq best practices: FastqToSam → MergeBamAlignment → MarkDuplicates → SplitNCigarReads → ValidateSamFile |
bqsr |
13–16 | GATK4 BaseRecalibrator (×2) + ApplyBQSR + AnalyzeCovariates |
mutect_calling |
17–23 | Mutect2 (tumour-only) + LearnReadOrientation + GetPileupSummaries + CalculateContamination + FilterMutectCalls + SelectVariants + curate_vcf |
db_construct |
24–31 | IndexFeatureFile + FastaAlternateReferenceMaker + revert_headers + gff3sort + alt_liftover + gffread + triple_translate + squish |
The full diagram (with the conditional fan-out gated by --include_variant_peptides) is in README.md.
Quick start
```bash
git clone https://github.com/sanjaysgk/ipg.git
cd ipg
pixi install
pixi run bash bin/build_test_bundle.sh # one-time chr22 test bundle build
pixi run nextflow run . -profile test,pixi --outdir results
```
Or pull directly via Nextflow at this revision:
```bash
nextflow run sanjaysgk/ipg -r 1.0.0dev -profile test,docker --outdir results
```
Five run profiles ship: `test`, `pixi` (no containers), `singularity`, `docker`, `monash` (M3 SLURM, `comp` partition, `xy86` project).
What is in this snapshot
- 23 nf-core/modules pinned via `modules.json` (STAR, samtools, RSeQC, StringTie, gffcompare, gffread, FastQC, MultiQC, 15 GATK4 tools).
- 8 local modules under `modules/local/` for IPG-specific tools and GATK4 wrappers not yet upstream in nf-core/modules: `curate_vcf`, `revert_headers`, `alt_liftover`, `triple_translate`, `squish`, `gff3sort`, `gatk4_validatesamfile`, `gatk4_fastaalternatereferencemaker`.
- `containers/ipg-tools/` reproducible Docker build with the five custom IPG C sources bundled in `containers/ipg-tools/src/`. Two of those sources carry locally-developed production fixes:
- `curate_vcf.c` (V2): larger field buffers (chrom 200, ref/alt 4096, filter 1024), dynamic INFO field allocation, explicit `free()` cleanup.
- `revert_headers.c`: optional 3-arg form for the output FASTA prefix (writes `.fasta` directly instead of the legacy hardcoded `tmpc.fasta`).
Both improvements have been pushed upstream to `sanjaysgk/immunopeptidogenomics@a09a74c` for separate citeable provenance.
- `pixi.lock` committing a bit-for-bit reproducible toolchain (Nextflow 25.10.4, GATK4 4.6.2, STAR 2.7.11b, samtools 1.23.1, StringTie 3.0.0, gffcompare 0.12.10, OpenJDK 17 — full list in `CHANGELOG.md`).
- `--include_variant_peptides` parameter (default `false`) to optionally fold alt-reference variant-derived peptides into the cryptic peptide DB. Default matches the legacy Scull 2021 / D122_Lung run, verified empirically against the legacy `squish.log`.
- `bin/build_test_bundle.sh` reproducible chr22 test bundle builder (idempotent, env-var-overridable).
- nf-test harness with 8 snapshot-stable per-module stub tests.
- GitHub Actions workflows: `ci.yml` parse-check matrix (Nextflow 24.04.2 + 25.10.4 × docker + singularity), and `build-ipg-tools.yml` that publishes `ghcr.io/sanjaysgk/ipg-tools` on push to `main` and on tags matching `ipg-tools-v*`.
- Mermaid pipeline diagram and a rewritten `README.md`.
- `CITATIONS.md` with the full reference list for every tool in the pipeline.
Variant calling caveat (please read before using)
`--include_variant_peptides` does NOT switch the variant caller to matched tumour-normal mode. Variant calling is always performed in tumour-only Mutect2 mode against a gnomAD-style germline allele-frequency database. This pipeline does not support matched tumour-normal calling. The flag controls only whether the discovered variants get folded into the final cryptic peptide DB. Set to `true` only when the sample is expected to harbour biologically meaningful somatic variants (e.g. tumour tissue, hypermutated samples, MMR-deficient cell lines). Leave at `false` (default) for normal tissue, cell lines, or any sample where variant peptides would mostly add noise.
Known limitations (the gap to `1.0.0`)
- The chr22 test bundle is built locally on Monash M3 scratch and is not accessible from GitHub Actions runners. CI runs only `nextflow config` parse checks; the full end-to-end test must be run locally on Monash.
- The pipeline has not yet been validated end-to-end against a full real sample. This is the next milestone toward `1.0.0`.
- The `ghcr.io/sanjaysgk/ipg-tools` container image referenced by the five IPG-tool local modules has not been published yet. Pushing an `ipg-tools-v0.1.0` git tag will trigger the GHA workflow that publishes the first image. Until then only the `pixi` profile (which uses tools from the local PATH) can run end-to-end without modification.
- `tests/default.nf.test` pipeline-level integration test is not yet wired up.
- `nf-core pipelines lint` is configured with several intentional exclusions documented inline in `.nf-core.yml`.
Authors
| Sanjay SG Krishna | pipeline port | Li Lab, Monash University |
| Kate Scull | original IPG method, custom C tools | Purcell Lab, Monash University |
| Chen Li | supervision | Li Lab, Monash University |
| Anthony W. Purcell | supervision | Purcell Lab, Monash University |
Citation
If you use this version, please cite the original method paper:
Scull KE, Pandey K, Ramarathinam SH, Purcell AW.
Immunopeptidogenomics: harnessing RNA-seq to illuminate the dark immunopeptidome.
Mol Cell Proteomics. 2021;20:100143.
doi.org/10.1016/j.mcpro.2021.100143
Full reference list for every tool used: `CITATIONS.md`
Full release notes: `CHANGELOG.md`