Welcome to the Nextflow workshop! In this hands-on tutorial, you'll learn how to build scalable, reproducible computational pipelines using Nextflow.
Nextflow is a workflow management system for building computational pipelines that are:
- Portable: Run on laptops, HPC clusters, or cloud
- Reproducible: Track every step with consistent results
- Scalable: Automatic parallelization across resources
- Flexible: Deploy anywhere (local, SLURM, AWS, etc.)
- Tool-agnostic: Integrate any command-line program (Python, R, bash scripts, compiled binaries)
Traditional computational workflows require:
- Complex bash scripts with nested loops
- Manual file tracking
- Custom parallelization logic
- Difficult result reproduction
- Messy integration of diverse tools
Nextflow handles:
- Automatic task parallelization
- Data flow management
- Failure recovery and checkpointing
- Clear maintainable code
- Seamless tool integration
Before we start, here are the core concepts you'll learn:
- Processes: Individual computational tasks (running any command-line tool or script)
- Channels: Data streams that connect processes together
- Workflows: The orchestration of processes and data flow
- Operators: Methods to transform and manipulate channels (
.map(),.collect(), etc.)
Example Domain: We use bioinformatics examples in this workshop, but the concepts apply to any computational domain:
- Image processing and computer vision
- Climate modeling and simulations
- Text analysis and natural language processing
- Machine learning pipelines
- Statistical analysis workflows
- Any field requiring batch processing of data
Tools Used: Throughout this workshop, you'll see how Nextflow integrates:
- Published tools: FastQC and Fastp (widely-used bioinformatics programs)
- Custom scripts: Python scripts written specifically for this analysis
- Your own tools: The same principles apply to any command-line program you use
The key insight: Nextflow doesn't care what your tools do - it just manages how data flows between them!
In this workshop, you'll progress through increasingly complex pipelines:
Scripts 01-05 (This tutorial):
- Script 01: Hello World - Your first Nextflow process
- Script 02: Working with files - Reading and writing data
- Script 03: Using parameters - Making pipelines configurable
- Script 04: Multiple inputs - Processing several files
- Script 05: Channels - Understanding data flow
Scripts 06-10 (Implementation tutorial):
- Script 06: Quality control with FastQC (published tool)
- Script 07: Data trimming with Fastp (published tool)
- Script 08: Parallel workflows (running multiple tools simultaneously)
- Script 09: Collecting results with ReadLenDist (custom Python script)
- Script 10: Complete multi-step pipeline (integrating everything)
This tutorial is designed to be:
- Hands-on: You'll run every script and see the results
- Progressive: Each script builds on concepts from the previous one
- Practical: Real tools and data from actual research workflows
- Interactive: Experiment, break things, and learn!
- Transferable: Apply these skills to your own research domain
Let's get started!
Use the link: Nova OnDemand
Please login using your ISU credentials.
We will be using the VS Code Server on ISU HPC cluster for this tutorial.
cd /work/short_term/<ISU_NetID>
git clone https://github.com/ISUgenomics/isu-nextflow-workshop.gitcp -a /work/short_term/workshop2_bash/01_data .Learning Goals:
- Understand the basic structure of a Nextflow script
- Learn what a process is and how it works
- See how the workflow block orchestrates processes
- Use the
.view()operator to display output
First, let's see what we're automating. Run this command directly in your terminal:
echo 'Welcome to the world of Nextflow!'Output:
Welcome to the world of Nextflow!
This prints a greeting directly in the shell. Simple, right? But what if you need to:
- Run this on 100 different inputs?
- Track when it was run and with what parameters?
- Resume if it fails?
- Run it on different compute systems?
That's where Nextflow comes in!
module load nextflow
module listThis loads the Nextflow module on the HPC cluster.
Let's look at the Nextflow version:
cat pipelines/01_hello_screen.nfClick to see the complete script
File: pipelines/01_hello_screen.nf
#!/usr/bin/env nextflow
/*
* Use echo to print a message to the screen
*/
process hello {
output:
stdout
script:
"""
echo "Welcome to the world of Nextflow!"
"""
}
workflow {
// Run the hello process
hello().view()
}The Shebang Line
#!/usr/bin/env nextflow- Declares this file as a Nextflow script
- Allows the script to be executed directly (like a bash script)
- Not strictly required, but good practice
The Process Block
process hello {
output:
stdout
script:
"""
echo "Welcome to the world of Nextflow!"
"""
}What is a Process?
A process is a basic computing unit in Nextflow. Think of it as a wrapper around any command-line tool or script.
Components:
process hello: Names the process "hello"output: stdout: Captures standard output (what the command prints)- This creates a channel containing the output
- Channels are how data flows between processes
script: """ ... """: Contains the actual command(s) to run- Triple quotes allow multi-line commands
- Can contain any bash/shell commands
- Variables can be interpolated with
${variable}
Key Insight: The process doesn't run immediately - it's just a definition. The workflow block decides when to run it.
The Workflow Block
workflow {
// Run the hello process
hello().view()
}What is a Workflow?
The workflow block is where you orchestrate your processes - deciding which to run, in what order, and with what data.
Components:
hello(): Executes the hello process- Returns a channel containing the process output
- In this case, the channel contains "Welcome to the world of Nextflow!"
.view(): A channel operator that prints channel contents to the terminal- Useful for debugging and seeing what's in a channel
- Without
.view(), the output would be captured but not displayed
Data Flow:
hello process → stdout channel → .view() → terminal
nextflow run pipelines/01_hello_screen.nfExpected Output:
N E X T F L O W ~ version 24.04.4
Launching `pipelines/01_hello_screen.nf` [maniac_albattani] DSL2 - revision: 91b2c7c409
executor > local (1)
[9c/4c931d] process > hello [100%] 1 of 1 ✔
Welcome to the world of Nextflow!
What does each line mean?
N E X T F L O W ~ version 24.04.4
- Nextflow version being used
Launching `pipelines/01_hello_screen.nf` [maniac_albattani] DSL2 - revision: 91b2c7c409
- Script being run
- Random name assigned to this run ("maniac_albattani")
- DSL2: Nextflow's Domain Specific Language version 2
- Git revision (if in a git repository)
executor > local (1)
- Executor: where the process runs (local machine, SLURM, AWS, etc.)
- (1): One process executed
[9c/4c931d] process > hello [100%] 1 of 1 ✔
[9c/4c931d]: Unique hash for this process execution- Used to find the work directory:
work/9c/4c931d.../
- Used to find the work directory:
process > hello: Name of the process[100%] 1 of 1 ✔: Progress indicator - 1 task completed successfully
Welcome to the world of Nextflow!
- The actual output from our process (displayed by
.view())
You've learned:
- A Nextflow script has processes (what to do) and a workflow (when to do it)
- Processes capture output into channels
- The
.view()operator displays channel contents - Nextflow tracks every execution with a unique hash
- Each process runs in its own work directory
Learning Goals:
- Learn how to create file outputs instead of stdout
- Understand the Nextflow work directory structure
- See where process outputs are stored
- Learn to inspect process execution artifacts
Building on Script 01:
In Script 01, we used output: stdout to print to the terminal. But real pipelines need to save results to files. Script 02 shows you how!
cat pipelines/02_hello_redirect.nfClick to see the complete script
File: pipelines/02_hello_redirect.nf
#!/usr/bin/env nextflow
/*
* Hello redirect: write greeting to file
*/
process hello {
output:
path 'result.txt'
script:
"""
echo "Welcome to the world of Nextflow!" > result.txt
"""
}
workflow {
hello()
}Comparing the two scripts
Script 01:
process hello {
output:
stdout // Captures what's printed to terminal
script:
"""
echo "Welcome to the world of Nextflow!"
"""
}
workflow {
hello().view() // .view() displays the stdout
}Script 02:
process hello {
output:
path 'result.txt' // Captures a file
script:
"""
echo "Welcome to the world of Nextflow!" > result.txt
"""
}
workflow {
hello() // No .view() needed - output is a file
}Key Differences:
- Output type:
stdout→path 'result.txt' - Script command: Direct echo → Redirect to file with
> - Workflow: No
.view()needed (file is automatically saved)
The path output qualifier
output:
path 'result.txt'What does path mean?
pathtells Nextflow: "This process creates a file"'result.txt'is the filename to capture- Nextflow will look for this file after the process completes
- The file is automatically added to a channel (for downstream processes)
Important: The filename in output: must match the filename created in script:
output:
path 'result.txt' // Nextflow expects this file
script:
"""
echo "..." > result.txt // Script must create this file
"""nextflow run pipelines/02_hello_redirect.nfExpected Output:
N E X T F L O W ~ version 24.04.4
Launching `pipelines/02_hello_redirect.nf` [determined_hopper] DSL2 - revision: 5ff76c72d0
executor > local (1)
[1a/e8c8c0] process > hello [100%] 1 of 1 ✔
Notice: No output is printed! That's because the result is in a file, not stdout.
Where did the file go?
The Work Directory:
Nextflow stores all process outputs in the work/ directory. Let's explore it:
tree -a workOutput:
work
└── 1a
└── e8c8c0a1b2c3d4e5f6g7h8i9j0k1l2m3
├── .command.begin
├── .command.err
├── .command.log
├── .command.out
├── .command.run
├── .command.sh
├── .exitcode
└── result.txt ← Your output file!
2 directories, 8 files
Structure:
work/: Main directory for all process executions1a/: First two characters of the process hashe8c8c0.../: Full unique hash for this specific executionresult.txt: Your output file.command.*: Nextflow's internal files
What are all these .command files?
Each process execution creates several files:
Your Files:
result.txt: The output file your process created
Nextflow's Files:
.command.sh: The actual shell script Nextflow generated and ran- Look inside to see exactly what was executed!
.command.run: Wrapper script that sets up the environment.command.out: Standard output (stdout) from the process.command.err: Standard error (stderr) from the process.command.log: Combined log of stdout and stderr.exitcode: Exit status (0 = success, non-zero = error).command.begin: Timestamp when process started
Why is this useful?
- Debugging: Check
.command.errif something fails - Reproducibility:
.command.shshows exactly what ran - Verification:
.exitcodeconfirms success/failure
Let's read the file using the hash from the Nextflow output:
# Use the hash from your output (e.g., [1a/e8c8c0])
cat work/1a/e8c8c0*/result.txtOutput:
Welcome to the world of Nextflow!
Pro tip: You can also use:
find work -name result.txt -exec cat {} \;You've learned:
- Use
output: path 'filename'to create file outputs - The filename in
output:must match what the script creates - All process outputs go to the
work/directory - Each process gets a unique subdirectory with a hash
- The work directory contains debugging artifacts (
.command.*files) - You can inspect these files to understand what happened
Learning Goals:
- Learn to use
publishDirto copy outputs to accessible locations - Understand the difference between work directory and published directory
- Learn about different
publishDirmodes - Make your results easy to find and share
Building on Script 02:
In Script 02, outputs went to the work/ directory with cryptic hashes. That's great for Nextflow's internal management, but terrible for humans! Script 03 shows you how to publish outputs to user-friendly locations.
Script 02 output location:
work/1a/e8c8c0a1b2c3d4e5f6g7h8i9j0k1l2m3/result.txt
Problems:
- Hard to find (need to look up the hash)
- Changes every run (new hash each time)
- Gets deleted if you clean the work directory
- Not suitable for sharing results
Solution: Use publishDir to copy/link outputs to a permanent, accessible location!
cat pipelines/03_hello_publishdir.nfClick to see the complete script
File: pipelines/03_hello_publishdir.nf
#!/usr/bin/env nextflow
/*
* Create an output directory where the output will be saved
*/
process hello {
publishDir 'output', mode: 'copy'
output:
path 'result.txt'
script:
"""
echo "Hello Nextflow World!" > result.txt
"""
}
workflow {
// Run the hello process
hello()
}The publishDir directive
Script 02:
process hello {
output:
path 'result.txt'
script:
"""
echo "Welcome to the world of Nextflow!" > result.txt
"""
}Script 03:
process hello {
publishDir 'output', mode: 'copy' // NEW!
output:
path 'result.txt'
script:
"""
echo "Hello Nextflow World!" > result.txt
"""
}The only change: Added publishDir 'output', mode: 'copy'
How publishDir works
publishDir 'output', mode: 'copy'What it does:
- Takes files from the work directory
- Copies (or links) them to the specified directory
- Happens after the process completes successfully
Components:
'output': Target directory name- Can be any path:
'results','my_outputs','/absolute/path' - Created automatically if it doesn't exist
- Can be any path:
mode: 'copy': How to publish the file
Data Flow:
Process runs → Creates result.txt in work/ → publishDir copies to output/
publishDir modes
Common modes:
-
mode: 'copy'(most common)- Creates a copy of the file
- Original stays in work directory
- Safe: deleting published file doesn't affect work directory
-
mode: 'symlink'- Creates a symbolic link
- Saves disk space (no duplicate)
- Faster than copy
- Warning: If you clean work directory, link breaks!
-
mode: 'move'- Moves the file (removes from work directory)
- Saves disk space
- Warning: Can't resume if you delete the published file!
Recommendation: Use 'copy' unless disk space is critical.
nextflow run pipelines/03_hello_publishdir.nfExpected Output:
N E X T F L O W ~ version 24.04.4
Launching `pipelines/03_hello_publishdir.nf` [determined_hopper] DSL2 - revision: 5ff76c72d0
executor > local (1)
[2b/f9d1e2] process > hello [100%] 1 of 1 ✔
Now the output is in an easy-to-find location!
ls -la output/Output:
total 8
drwxr-xr-x 3 user group 96 Oct 19 01:30 .
drwxr-xr-x 8 user group 256 Oct 19 01:30 ..
-rw-r--r-- 1 user group 23 Oct 19 01:30 result.txt
View the file:
cat output/result.txtOutput:
Hello Nextflow World!
Much easier than:
cat work/2b/f9d1e2a3b4c5d6e7f8g9h0i1j2k3l4m5/result.txtWork directory vs. Published directory
After running Script 03, you have TWO copies:
-
Work directory (Nextflow's internal copy):
work/2b/f9d1e2.../result.txt- Used for pipeline management
- Used for
-resumefunctionality - Can be cleaned up later
-
Published directory (Your accessible copy):
output/result.txt- Easy to find and share
- Permanent location
- Safe to use in downstream analysis
Best Practice: Keep work directory for development, clean it periodically. Keep published outputs permanently.
You've learned:
publishDircopies outputs to accessible locations- Use
mode: 'copy'for safety (most common) - Published files are separate from work directory
- The work directory is still created (for resume functionality)
- Published outputs are easy to find, share, and use
Learning Goals:
- Learn how to define process inputs
- Understand the
valinput qualifier - See how to pass data from workflow to process
- Use variables in process scripts
Building on Script 03:
Scripts 01-03 had hardcoded messages. But real pipelines need to process different data! Script 04 introduces process inputs - the foundation for making processes reusable.
Script 03:
process hello {
script:
"""
echo "Hello Nextflow World!" > result.txt // Always the same!
"""
}Problems:
- Can't change the message without editing the script
- Can't process different inputs
- Not reusable
Solution: Add an input block to accept data!
cat pipelines/04_hello_input.nfClick to see the complete script
File: pipelines/04_hello_input.nf
#!/usr/bin/env nextflow
process hello {
publishDir 'output', mode: 'copy'
input:
val welcome
output:
path 'result.txt'
script:
"""
echo "$welcome" > result.txt
"""
}
workflow {
hello("Hello, welcome to the world of Nextflow!")
}Adding the input block
Script 03:
process hello {
publishDir 'output', mode: 'copy'
output:
path 'result.txt'
script:
"""
echo "Hello Nextflow World!" > result.txt
"""
}
workflow {
hello() // No arguments
}Script 04:
process hello {
publishDir 'output', mode: 'copy'
input: // NEW!
val welcome // NEW!
output:
path 'result.txt'
script:
"""
echo "$welcome" > result.txt // Uses variable!
"""
}
workflow {
hello("Hello, welcome to the world of Nextflow!") // Pass data
}Key Changes:
- Added
input:block withval welcome - Changed script to use
$welcomevariable - Workflow now passes a string to
hello()
The input block
input:
val welcomeWhat does this mean?
input:: Declares what data the process needsval: Input qualifier meaning "value" (string, number, etc.)welcome: Variable name to use in the script
Input qualifiers:
val: Simple values (strings, numbers, booleans)path: Files or directories (we'll use this in Scripts 06-10)tuple: Multiple values grouped together
Using inputs in scripts
script:
"""
echo "$welcome" > result.txt
"""Variable interpolation:
$welcome: Bash-style variable reference- Nextflow replaces
$welcomewith the actual value before running - Works like bash variables in the script block
Example: If welcome = "Hello!", the script becomes:
echo "Hello!" > result.txtPassing data in the workflow
workflow {
hello("Hello, welcome to the world of Nextflow!")
}How it works:
- Workflow calls
hello()with a string argument - The string is passed to the process's
inputblock - The value is assigned to the
welcomevariable - The process script uses
$welcome
Data Flow:
Workflow: "Hello, welcome..." → Process input: welcome → Script: $welcome
nextflow run pipelines/04_hello_input.nfExpected Output:
N E X T F L O W ~ version 24.04.4
Launching `pipelines/04_hello_input.nf` [determined_hopper] DSL2 - revision: 5ff76c72d0
executor > local (1)
[3c/a1b2c3] process > hello [100%] 1 of 1 ✔
cat output/result.txtOutput:
Hello, welcome to the world of Nextflow!
The message came from the workflow, not hardcoded in the process!
You've learned:
- Processes can accept inputs using the
input:block - Use
valfor simple values (strings, numbers) - Variables are accessed with
$variable_namein scripts - The workflow passes data to processes as arguments
- This makes processes reusable with different data
Learning Goals:
- Learn to use
paramsfor configurable pipelines - Understand default parameter values
- Learn to override parameters from the command line
- Make pipelines flexible and reusable
Building on Script 04:
Script 04 required editing the workflow to change the message. But what if you want users to customize the pipeline without editing code? Script 05 introduces parameters - the standard way to make Nextflow pipelines configurable!
Script 04:
workflow {
hello("Hello, welcome to the world of Nextflow!") // Hardcoded!
}To change the message, you must:
- Open the script file
- Edit the string
- Save the file
Not user-friendly!
Solution: Use params to allow command-line configuration!
cat pipelines/05_hello_default.nfClick to see the complete script
File: pipelines/05_hello_default.nf
#!/usr/bin/env nextflow
process hello {
publishDir 'output', mode: 'copy'
input:
val welcome
output:
path 'result.txt'
script:
"""
echo "$welcome" > result.txt
"""
}
params.welcome = "Hello, welcome to the world of Nextflow!"
workflow {
hello(params.welcome)
}Adding parameters
Script 04:
workflow {
hello("Hello, welcome to the world of Nextflow!")
}Script 05:
params.welcome = "Hello, welcome to the world of Nextflow!" // NEW!
workflow {
hello(params.welcome) // Uses parameter
}Key Changes:
- Added
params.welcomewith a default value - Workflow uses
params.welcomeinstead of hardcoded string
What are params?
Parameters are Nextflow's way of making pipelines configurable.
params.welcome = "Hello, welcome to the world of Nextflow!"Key Concepts:
params: Special Nextflow object for parameters.welcome: Parameter name (you choose this)= "...": Default value (used if not overridden)
Naming convention: Use lowercase with underscores
- Good:
params.input_file,params.output_dir,params.quality_threshold - Avoid:
params.InputFile,params.OUTDIR
How parameters work
Three ways to set parameters:
-
Default value in script (lowest priority):
params.welcome = "Default message"
-
Command-line argument (highest priority):
nextflow run script.nf --welcome "Custom message" -
Config file (medium priority - covered in advanced topics):
params { welcome = "Config message" }
Priority: Command-line > Config file > Script default
nextflow run pipelines/05_hello_default.nfExpected Output:
N E X T F L O W ~ version 24.04.4
Launching `pipelines/05_hello_default.nf` [determined_hopper] DSL2 - revision: 5ff76c72d0
executor > local (1)
[4d/b2c3d4] process > hello [100%] 1 of 1 ✔
Check the output:
cat output/result.txtOutput:
Hello, welcome to the world of Nextflow!
The default value was used!
Now the magic - change the message without editing the script:
nextflow run pipelines/05_hello_default.nf --welcome "Greetings from the command line!"Check the output:
cat output/result.txtOutput:
Greetings from the command line!
It worked! The command-line value overrode the default.
How this applies to real pipelines
Typical bioinformatics pipeline parameters:
// Input/Output
params.reads = "data/*_{R1,R2}.fastq.gz"
params.output_dir = "results"
// Quality control
params.min_quality = 20
params.min_length = 50
// Analysis
params.genome = "/path/to/reference.fa"
params.threads = 4Users can customize without editing:
nextflow run pipeline.nf \\
--reads "my_data/*.fastq.gz" \\
--output_dir "my_results" \\
--min_quality 30 \\
--threads 8This is how Scripts 06-10 work!
Best practices for parameters
1. Provide sensible defaults:
params.threads = 4 // Good default
params.output_dir = "results" // Reasonable2. Document your parameters:
// Input files (glob pattern)
params.reads = "data/*.fastq.gz"
// Quality threshold (Phred score)
params.min_quality = 203. Group related parameters:
// Input/Output
params.input_dir = "data"
params.output_dir = "results"
// Quality Control
params.min_quality = 20
params.min_length = 504. Use descriptive names:
- Good:
params.quality_threshold,params.input_fastq - Bad:
params.qt,params.in
You've learned:
- Use
params.name = valueto define parameters - Parameters provide default values
- Override with
--param_name valueon command line - Command-line values take priority over defaults
- Parameters make pipelines user-friendly and reusable
- This is the foundation for configurable bioinformatics pipelines
Congratulations! You've completed the foundational scripts (01-05). You now understand:
- Processes and workflows
- File outputs and publishDir
- Process inputs
- Parameters
You're ready for the implementation tutorial (Scripts 06-10) with real bioinformatics tools!
Script 01 - Hello World:
- Basic Nextflow script structure
- Process definitions and workflow orchestration
- Using
.view()for debugging
Script 02 - File Outputs:
- Creating file outputs with
path - Understanding the work directory
- Inspecting
.command.*files for debugging
Script 03 - Publishing:
- Using
publishDirto make outputs accessible - Different publishing modes (copy, symlink, move)
- Separating work directory from final outputs
Script 04 - Process Inputs:
- Defining process inputs with
val - Variable interpolation in scripts
- Passing data from workflow to process
Script 05 - Parameters:
- Making pipelines configurable with
params - Setting default values and overriding from command line
- Best practices for parameter naming
Now you'll apply these concepts to build a real bioinformatics pipeline:
Script 06 - FastQC Quality Control:
- Run a published tool (FastQC) on multiple files in parallel
Script 07 - Paired-End Read Trimming:
- Handle paired-end sequencing data with Fastp
Script 08 - Parallel Workflows:
- Run multiple processes simultaneously
Script 09 - Collecting Results:
- Use
.collect()to aggregate files - Run custom Python scripts
Script 10 - Complete Pipeline:
- Chain processes together with channel transformations
Open the implementation tutorial:
cat implementation.mdOr in your editor:
code implementation.mdThroughout Scripts 01-05, Nextflow created a work/ directory with cryptic subdirectories. Let's understand why this exists and when to clean it up.
The work directory serves two critical purposes:
-
Pipeline execution workspace
- Each process runs in its own isolated subdirectory
- Contains all inputs, outputs, and execution logs
- Enables debugging (you can inspect exactly what happened)
-
Resume functionality
- Nextflow tracks which tasks completed successfully
- Allows skipping already-completed work
- Essential for long-running pipelines
How resume works
Imagine you run a pipeline with 100 samples, and it fails on sample 95. Without -resume, you'd have to rerun all 100 samples!
With -resume, Nextflow is smart:
# First run (fails at sample 95)
nextflow run script.nf
# Fix the issue, then resume
nextflow run script.nf -resumeWhat happens:
- Nextflow checks the work directory
- Finds that samples 1-94 completed successfully
- Skips those tasks (uses cached results)
- Only runs sample 95 onwards
How it knows:
- Each task has a unique hash based on:
- Process script
- Input files
- Parameters
- If hash matches and task succeeded, reuse the result!
When to use -resume
Use -resume when:
- Your pipeline failed and you fixed the issue
- You stopped the pipeline and want to continue
- You added more samples and want to process only the new ones
- You're iterating during development
Example scenario:
# Run pipeline on 10 samples
nextflow run script.nf --input "data/*.fastq"
# (Completes successfully)
# Add 5 more samples to data/
# Run again with -resume
nextflow run script.nf --input "data/*.fastq" -resume
# Only processes the 5 new samples!During development (now): Keep it!
- You'll experiment with scripts
- You'll use
-resumefrequently - Disk space is minimal (Scripts 01-05 are tiny)
After pipeline completion: Optional cleanup
- Published outputs are in
output/(safe to keep) - Work directory can be deleted to save space
- Only matters when working with large files
If you want to clean up before moving to the implementation tutorial:
# Remove work directory (safe - outputs are published)
rm -rf work/
# Remove output directory (if you want a fresh start)
rm -rf output/Or keep everything:
# Do nothing! The work directory from Scripts 01-05 is tiny.
# You'll learn more about cleanup strategies in the implementation
# tutorial when working with real sequencing data (GB of files).Using nextflow clean (optional)
Nextflow provides a clean command for more selective cleanup:
# See what would be deleted (dry run)
nextflow clean -n
# Delete all work files
nextflow clean -f
# Keep only successful runs, delete failed ones
nextflow clean -f -k
# Delete work files older than a certain run
nextflow clean -f -before <run_name>For now, simple rm -rf work/ is fine. You'll learn more about nextflow clean in the implementation tutorial.
# Run a pipeline
nextflow run script.nf
# Override parameters
nextflow run script.nf --param_name value
# Resume from checkpoint
nextflow run script.nf -resume// Parameters
params.input = "data/*.fastq"
// Process
process MyProcess {
publishDir 'results', mode: 'copy'
input:
path input_file
output:
path 'output.txt'
script:
"""
my_tool $input_file > output.txt
"""
}
// Workflow
workflow {
Channel.fromPath(params.input) | MyProcess
}You've built a solid foundation. Now let's apply it to real-world analysis!
Next: Implementation Tutorial (Scripts 06-10)
In the implementation tutorial, you'll process real sequencing data and build a complete analysis pipeline.