STAR-suite reorganizes STAR into module-focused directories while keeping a single source of truth for shared code. Build outputs remain compatible with existing STAR workflows, and the new top-level Makefile exposes module targets.
No new external dependencies are required for the suite modules in this repo. The current integrations (including STAR-perturb, STAR-SLAM, and QC updates) are built with the existing toolchain and vendored components.
STAR-suite supports partial compilation: build only the module/tool targets you need instead of building the full suite every time.
Agent quickstart: see AGENTS.md for repo-specific guardrails, tests, and recent changes.
core/
legacy/ # Upstream STAR layout (single source of truth)
features/ # Shared overlays and feature tooling
process_features/ # Perturb feature extraction/calling implementation
feature_barcodes/ # assignBarcodes/demux tooling
libscrna/ # EmptyDrops/OrdMag/Occupancy shared library
flex/ # Flex-specific code + tools
slam/ # SLAM-seq code + tools
build/ # Modular make fragments
docs/ # Suite-level docs
tests/ # Suite-level tests (see tests/ARTIFACTS.md for artifact locations)
tools/ # Suite-level scripts/utilities
mcp_server/ # MCP server for scripted discovery/preflight/run workflows
- STAR-core (
core/): Legacy STAR (indexing, bulk, Solo) plus shared utilities. Build:make core(binary atcore/legacy/source/STAR). - STAR-perturb (
core/legacy/+core/features/process_features/): CR-compatible perturb-seq path with integrated feature extraction/calling (process_features+call_features) andcrispr_analysis/outputs in CR-compat mode. Primary run path:STAR --crMultiConfig ... --defaultCrCompat yes(see STAR-perturb section below). - STAR-Flex (
flex/): FlexFilter pipeline and Flex-specific integrations. Build tools:make flexormake flex-tools. - STAR-SLAM (
slam/): SLAM-seq quantification, SNP masking, trimming/QC. Build tools:make slamormake slam-tools. - Feature Barcodes (
core/features/feature_barcodes/): Vendoredprocess_featurestools for perturb-seq testing (assignBarcodes,demux_bam,demux_fastq). Build tools:make feature-barcodes-tools. - Shared Feature Toolchains (
core/features/): Reusable tool layers used across modules, includingvbem(TranscriptVB helpers),yremove_*(Y/noY splitting),bamsort, andlibscrna. Build tools:make vbem-tools,make yremove-tools, plus in-core integrations. - MCP Server (tooling) (
mcp_server/): Agent automation service for dataset/test discovery and controlled execution (list_datasets,list_test_suites,preflight,run_script,collect_outputs). This is repo tooling, not an analysis module.
Build from repo root:
# Core STAR binary
make core
# Module-focused builds
make flex
make slam
make feature-barcodes-tools
# Build everything
make allSelective default build:
make default INCLUDE="core slam-tools"
make default EXCLUDE="flex-tools"The top-level Makefile supports a default build, full build, and conditional include/exclude filters.
- Default build:
make(same asmake default)- Builds the “usual culprits” (core + common tools).
- Optional filters:
make default INCLUDE="core flex-tools"make default EXCLUDE="slam-tools yremove-tools"
- Build everything:
make all- Includes everything in the suite (core + all tools).
Run make help to see the full target list and descriptions.
Recent updates to the Core module (STAR 2.7.11b and prior) include:
- Batch Mode (single-pass, non-Solo):
--batchMode 1processes multiple FASTQs in one STAR invocation while reusing the loaded genome. This removes the need for--genomeLoadkeep-in-memory workflows that are often brittle in containerized and HPC job environments. It is also important when analyses require shared static inputs across many samples (for example SLAM SNP masks and blank-derived background/error settings), so each sample is processed under the same fixed context.- Limits: batch mode is single-pass only (no
--twopassMode) and not supported with Solo (--soloType). - Output routing: use
--outFileNamePrefixAuto 1for per-sample subdirectories under one output root.
- Limits: batch mode is single-pass only (no
- Transcriptome Output: Replaced
--quantTranscriptomeBanwith--quantTranscriptomeSAMoutputfor more explicit control (e.g.,BanSingleEnd_ExtendSoftclip). - TranscriptVB Quantification: Variational Bayes and EM quantification for transcript-level abundance (
--quantMode TranscriptVB), with parity-oriented behavior against Salmon alignment-mode. - Reference Automation: Automated reference download/build (
--autoIndex,--autoCksumUpdate) plus automatictranscriptome.fageneration during indexing for transcript-level quant workflows. - Cutadapt-Compatible Trimming: Native cutadapt-style trimming path (
--trimCutadapt Yes) for bulk/PE workflows. - Samtools-style BAM Sorting: Spill-to-disk sort (
--outBAMsortMethod samtools) to reduce peak RAM pressure versus in-memory bin sorting. - Y/NoY Separation: Split BAM and FASTQ outputs by chrY alignment (
--emitNoYBAM,--emitYNoYFastq). - EmptyDrops_CR Integration: CR-compatible EmptyDrops path (including libscrna-backed behavior in scRNA/perturb flows).
- Solo Features:
sFBAM tag for feature type and gene counts.--soloCBtype Stringfor arbitrary barcode strings.- Improved cell filtering and statistics with
--soloCellReadStats Standard.
STAR-Flex extends STAR-core with Flex-specific behavior:
- Flex Pipeline: Inline hash-based processing for 10x Genomics Flex (Fixed RNA Profiling). Includes sample tag detection, 1MM pseudocount correction for CBs, clique-based UMI deduplication, and occupancy filtering.
Integrated SLAM-seq quantification with GRAND-SLAM parity:
- Quantification: Full gene-level NTR estimation (Binomial/EM models).
- Compatibility Mode:
--slamCompatMode gedienables GEDI-compatible behaviors (intronic classification, lenient overlap, overlap weighting) for parity testing. - Auto-Trimming: Variance-based detection of artifact-prone read ends (
--autoTrim variance). - QC: Comprehensive reports for T->C rates and error modeling.
- Batch Layout + Blank-First:
--outFileNamePrefixAuto 1organizes SLAM outputs intoalignments/,counts/,qc/,y_separated/under a single root, and--slamErrorRateFromBlank 1can seed the background error rate from a blank (e.g. no4sU). - Binary Dump + Requant:
--slamDumpBinary 1 --slamDumpWeights 1emits<sample>_slam_dump.binand<sample>_slam_weights.bininalignments/(batch + auto prefix layout). Theslam_requanttool can re‑quantify these dumps with exact parity toSlamQuant.out(Pearson/Spearman 1.0 in the 1M parity check). - Binary Dump Format: bitwise header + record layout is documented in
slam/docs/SLAM_DUMP_FORMAT.md.
STAR-suite includes a perturb-seq path that combines CR-compatible Solo behavior with integrated CRISPR feature calling. This is the path used for STAR-perturb work and CR compatibility comparisons.
- Integrated CR-compat in STAR (GEX + feature merge + CRISPR calling):
- Use
--crMultiConfig <multi_config.csv> - Recommended bundle:
--defaultCrCompat yes - Key controls:
--crMinUmi 10(default; lower to2-3for lineage-barcode style assays)--soloCrGexFeature GeneFull(orGenewhen explicitly required)
- Use
- Standalone feature pipeline tool (
core/legacy/source/star_feature_call):- Full pipeline: FASTQ -> MEX -> calls
- Call-only mode: MEX -> calls
--compat-perturbwrites CR9-stylecrispr_analysis/outputs.
- A375 small-set parity result:
- On the A375 1k CRISPR 5' small set, STAR CRISPR calling matched Cell Ranger at
1083/1083common barcodes (100.0% exact-match) when using min-UMI10. - Reference report:
tests/crispr_feature_calling_comparison_report.md.
- On the A375 1k CRISPR 5' small set, STAR CRISPR calling matched Cell Ranger at
STAR-Flex and STAR-SLAM now generate detailed QC reports:
- SLAM QC (
--slamQcReport <prefix>): Generates an interactive HTML report (.html) and JSON metrics (.json) visualizing:- T->C conversion rates per read position.
- Variance analysis for auto-trimming (Stdev curves, segmented regression fits).
- Trimming overlays showing chosen 5'/3' cut sites.
- FlexFilter QC (
flexfilter_summary.tsv):- Cell calling statistics (EmptyDrops/OrdMag results).
- Cell counts, UMI thresholds, and filtering rates per sample.
Standard STAR flags apply. See core/legacy/README.md.
--runMode:alignReads,genomeGenerate,soloCellFiltering--genomeDir: Path to genome index--readFilesIn: Input read files--outSAMtype: Output SAM/BAM format (e.g.,BAM SortedByCoordinate)--batchMode: Batch multiple FASTQs in one run (bulk, single-pass only; no Solo or 2-pass)--soloType: Single-cell mode (e.g.,CB_UMI_Simple,SmartSeq)--soloCbUbRequireTogether: Enforce CB/UB tag pairing for tag injection (yes/no, defaultyes)--soloCrGexFeature: CR-compat merged GEX source (auto,gene,genefull)
See flex/README_flex.md for full reference.
- Pipeline:
--flex yes: Enable Flex pipeline.--soloFlexExpectedCellsPerTag: Expected cells per sample tag.--soloSampleWhitelist: TSV mapping sample tags to labels.
- Trimming:
--trimCutadapt Yes: Enable cutadapt-style trimming.--trimCutadaptCompat: Compatibility mode (e.g.,Cutadapt3).
- Quantification:
--quantMode TranscriptVB: Enable VB/EM quantification.
- Y-Split:
--emitNoYBAM yes: Emit_Y.bamand_noY.bam.--emitYNoYFastq yes: Emit split FASTQ files.
- Reference:
--autoIndex Yes: Enable automated reference download/build.--cellrangerStyleIndex Yes: Use CellRanger-style reference formatting.
- Sorting:
--outBAMsortMethod samtools: Enable spill-to-disk sorting.
See slam/docs/SLAM_COMPATIBILITY_MODE.md and slam/docs/SLAM_seq.md.
- Quantification:
--slamQuantMode 1: Enable SLAM quantification.--slamGrandSlamOut 1: Generate GRAND-SLAM compatible output.--slamErrorRateFromBlank 1: Seed error rate from the detection pass (useful when a blank is first).
- Compatibility:
--slamCompatMode gedi: Enable GEDI compatibility.--slamCompatIntronic,--slamCompatLenientOverlap: Fine-grained control.
- Trimming:
--autoTrim variance: Enable variance-based auto-trimming.--slamTrim5p,--slamTrim3p: Manual trim guards.
- Batch Layout:
--outFileNamePrefixAuto 1: Derive sample name from first FASTQ and route outputs into subdirs under--outFileNamePrefix.
See docs/feature_barcodes.md and docs/CRISPR_FEATURE_CALLING_IMPLEMENTATION_SUMMARY.md.
--crMultiConfig: Enable Cell Ranger-style multi processing with feature libraries.--defaultCrCompat yes: Apply the CR-compat perturb defaults bundle.--crMinUmi: Minimum UMI threshold for CRISPR feature calling (default10).--soloCrGexFeature: Control merged GEX source (auto,gene,genefull).--soloCrMode CR: Enable CR-compatible single-cell behavior.
Standalone tool (star_feature_call) key flags:
--compat-perturb: CR9-compatible output layout (crispr_analysis/).--feature-ref,--whitelist,--fastq-dir,--output-dir: FASTQ -> MEX -> calls.--call-only --mex-dir: call_features-only pass on existing MEX.--emptydrops-use-fdr,--min-umi,--ratio-test: calling controls.
Core alignment:
core/legacy/source/STAR \
--runMode alignReads \
--genomeDir /path/to/genome_index \
--readFilesIn reads.fq.gz \
--readFilesCommand zcat \
--outFileNamePrefix out/ \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes NH HI AS nM MDBatch mode (bulk, single-pass, SE):
core/legacy/source/STAR \
--runMode alignReads \
--genomeDir /path/to/genome_index \
--readFilesIn A_R1.fq.gz,B_R1.fq.gz \
--readFilesCommand zcat \
--outFileNamePrefix /path/to/out_root/ \
--outFileNamePrefixAuto 1 \
--batchMode 1 \
--outSAMtype BAM SortedByCoordinateBatch mode (bulk, single-pass, PE):
core/legacy/source/STAR \
--runMode alignReads \
--genomeDir /path/to/genome_index \
--readFilesIn A_R1.fq.gz,B_R1.fq.gz A_R2.fq.gz,B_R2.fq.gz \
--readFilesCommand zcat \
--outFileNamePrefix /path/to/out_root/ \
--outFileNamePrefixAuto 1 \
--batchMode 1 \
--outSAMtype BAM SortedByCoordinateNotes:
- Batch mode is single-pass only (not compatible with
--twopassMode). - Batch mode is not supported with Solo (
--soloType).
Flex Mode (10x Fixed RNA Profiling):
core/legacy/source/STAR \
--runMode alignReads \
--genomeDir /path/to/flex_index \
--readFilesIn reads_R2.fq.gz reads_R1.fq.gz \
--flex yes \
--soloType CB_UMI_Simple \
--soloCBwhitelist /path/to/737K-fixed-rna-profiling.txt \
--soloSampleWhitelist sample_whitelist.tsv \
--outFileNamePrefix output/SLAM Mode (Standard):
core/legacy/source/STAR \
--runMode alignReads \
--genomeDir /path/to/genome_index \
--readFilesIn reads.fq.gz \
--readFilesCommand zcat \
--outFileNamePrefix out/ \
--outSAMtype BAM SortedByCoordinate \
--outSAMattributes NH HI AS nM MD \
--slamQuantMode 1 \
--slamSnpBed /path/to/snps.bedSLAM Mode (GEDI Compatibility):
core/legacy/source/STAR \
--runMode alignReads \
--genomeDir /path/to/genome_index \
--readFilesIn reads.fq.gz \
--slamQuantMode 1 \
--slamCompatMode gedi \
--autoTrim variance \
--outFileNamePrefix output/SLAM Batch Mode (blank-first, SE/PE):
core/legacy/source/STAR \
--runMode alignReads \
--genomeDir /path/to/genome_index \
--readFilesIn blank_R1.fq.gz,0h_R1.fq.gz,6h_R1.fq.gz,24h_R1.fq.gz \
--readFilesCommand zcat \
--outFileNamePrefix /path/to/out_root/ \
--outFileNamePrefixAuto 1 \
--slamQuantMode 1 \
--slamBatchMode 1 \
--slamErrorRateFromBlank 1 \
--slamSnpBed /path/to/snps.bedFor paired-end, pass two comma-separated mate lists:
--readFilesIn blank_R1.fq.gz,0h_R1.fq.gz,... blank_R2.fq.gz,0h_R2.fq.gz,...
STAR-perturb (integrated CR-compat mode):
core/legacy/source/STAR \
--runMode alignReads \
--genomeDir /path/to/index \
--crMultiConfig /path/to/multi_config.csv \
--defaultCrCompat yes \
--outFileNamePrefix /path/to/outs/STAR-perturb (standalone feature pipeline):
core/legacy/source/star_feature_call \
--compat-perturb \
--feature-ref /path/to/feature_reference.csv \
--whitelist /path/to/whitelist.txt \
--fastq-dir /path/to/feature_fastqs \
--filtered-barcodes /path/to/filtered_barcodes.tsv \
--output-dir /path/to/feature_out \
--emptydrops-use-fdr \
--min-umi 10- Core usage: core/legacy/README.md
- Flex pipeline: flex/README_flex.md
- SLAM compatibility: slam/docs/SLAM_COMPATIBILITY_MODE.md
- SLAM methodology: slam/docs/SLAM_seq.md
- STAR-perturb feature docs: docs/feature_barcodes.md
- STAR-perturb A375 parity report: tests/crispr_feature_calling_comparison_report.md
- Cell Ranger multi smoke tool: docs/cr_multi.md