From 7736ee2335675b6611163bf61afdf5386ef988e2 Mon Sep 17 00:00:00 2001 From: Matthew Fricke Date: Tue, 23 Jun 2026 12:18:04 -0600 Subject: [PATCH 1/5] Add Seqtk QuickByte stub --- README.md | 1 + test_seqtk_quickbyte.md | 86 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 87 insertions(+) create mode 100644 test_seqtk_quickbyte.md diff --git a/README.md b/README.md index e4d200a..c92eff1 100644 --- a/README.md +++ b/README.md @@ -51,6 +51,7 @@ Quickbytes are tutorials designed to help CARC users. * [Metabarcoding with QIIME2, Mothur, and USEARCH](https://github.com/UNM-CARC/QuickBytes/blob/master/Metabarcoding.md) * [BEAST at CARC](https://github.com/UNM-CARC/QuickBytes/blob/master/Beast_at_CARC.md) * [Population genetic simulations with msprime (backwards time)](https://github.com/UNM-CARC/QuickBytes/blob/master/msprime_quickbyte.md) + * [Seqtk Slurm smoke test](https://github.com/UNM-CARC/QuickBytes/blob/master/test_seqtk_quickbyte.md) * Computational Chemistry * [Orca on Wheeler and Taos](https://github.com/UNM-CARC/QuickBytes/blob/master/orca_wheeler_taos.md) * [Alphafold](https://github.com/UNM-CARC/QuickBytes/blob/master/alphafold.md) diff --git a/test_seqtk_quickbyte.md b/test_seqtk_quickbyte.md new file mode 100644 index 0000000..8e81dd7 --- /dev/null +++ b/test_seqtk_quickbyte.md @@ -0,0 +1,86 @@ +# Seqtk at CARC + +## Software Description + +Seqtk provides lightweight FASTA/FASTQ tools. This QuickByte is a stub based on the CARC `test-programs` regression suite. The example is intentionally small so it can run on the `debug` partition and serve as a starting point for adapting the application to a real research workload. + +Passing test-program examples used for this stub: + +- `seqtk/single-node/slurm-test.sh`: `pass`, job `806485`, elapsed `00:00:01`, CPUs `1` + +## Example Slurm Script + +Save the following as `slurm-test.sh` in the example directory and submit it with `sbatch slurm-test.sh`. + +```bash +#!/bin/bash -l +# Run this file with: sbatch slurm-test.sh +# This script converts a tiny FASTQ file to FASTA with SeqTK. + +# Slurm resources for this short SeqTK example. +#SBATCH --job-name=test-seqtk +#SBATCH --output=%x-%j.out +#SBATCH --error=%x-%j.err +#SBATCH --partition=debug +#SBATCH --time=00:05:00 +#SBATCH --nodes=1 +#SBATCH --ntasks=1 +#SBATCH --cpus-per-task=1 +#SBATCH --mem=1G + +# Test harness: fail fast on errors, unset variables, or failed pipeline commands. +set -euo pipefail + +# Test harness: locate this example directory when submitted from the repo root. +script_dir="${SLURM_SUBMIT_DIR:-$PWD}" +if [[ -d "$script_dir/seqtk/single-node" ]]; then + script_dir="$script_dir/seqtk/single-node" +fi + +# Test harness: create a clean per-job output directory so runs do not collide. +run_dir="$script_dir/outputs/${SLURM_JOB_NAME}-${SLURM_JOB_ID}" +rm -rf "$run_dir" +mkdir -p "$run_dir" +cd "$run_dir" + +# Fundamental: load SeqTK. +module --ignore-cache load seqtk/1.4-qhos + +# Test harness: write a tiny FASTQ input. +cat > reads.fq <<'EOF' +@read1 +ACGTACGT ++ +IIIIIIII +@read2 +TTTTCCCC ++ +HHHHHHHH +EOF + +# Fundamental: convert FASTQ to FASTA. +seqtk seq -A reads.fq > reads.fa +# Test check: confirm both reads are present in FASTA output. +test "$(grep -c '^>' reads.fa)" -eq 2 +# Test check: confirm read1 appears as a FASTA header. +grep -q ">read1" reads.fa +``` + +The important Slurm resource lines are the `#SBATCH` directives near the top of the script. They request the debug partition, a small amount of time, and the CPU, memory, node, or GPU resources needed by this smoke test. The `module load` commands prepare the software environment, and `srun` is used when the application should be launched through Slurm across allocated tasks. + +## Example output + +The following abbreviated result is from the Easley debug regression run used to validate this example. + +```text +Script: seqtk/single-node/slurm-test.sh +Job ID: 806485 +Slurm state: COMPLETED +Exit code: 0:0 +Elapsed time: 00:00:01 +Allocated nodes: 1 +Allocated CPUs: 1 +Result: pass +``` + +For a successful run, the Slurm state should be `COMPLETED`, the exit code should be `0:0`, and any application-specific checks in the script should pass. From 77287686f37accadd1a452b90ba1d5ceaad14319 Mon Sep 17 00:00:00 2001 From: Matthew Fricke Date: Mon, 29 Jun 2026 13:03:35 -0600 Subject: [PATCH 2/5] Make SeqTK QuickByte self-contained --- test_seqtk_quickbyte.md | 39 ++++++++++++++------------------------- 1 file changed, 14 insertions(+), 25 deletions(-) diff --git a/test_seqtk_quickbyte.md b/test_seqtk_quickbyte.md index 8e81dd7..7ddeb37 100644 --- a/test_seqtk_quickbyte.md +++ b/test_seqtk_quickbyte.md @@ -2,11 +2,7 @@ ## Software Description -Seqtk provides lightweight FASTA/FASTQ tools. This QuickByte is a stub based on the CARC `test-programs` regression suite. The example is intentionally small so it can run on the `debug` partition and serve as a starting point for adapting the application to a real research workload. - -Passing test-program examples used for this stub: - -- `seqtk/single-node/slurm-test.sh`: `pass`, job `806485`, elapsed `00:00:01`, CPUs `1` +SeqTK is a lightweight command-line toolkit for working with FASTA and FASTQ sequence files. It can convert between formats, trim reads, sample reads, and perform other common preprocessing steps used in genomics workflows. This QuickByte uses a tiny FASTQ file and converts it to FASTA so you can see the basic Slurm pattern without needing a large sequencing dataset. ## Example Slurm Script @@ -28,25 +24,20 @@ Save the following as `slurm-test.sh` in the example directory and submit it wit #SBATCH --cpus-per-task=1 #SBATCH --mem=1G -# Test harness: fail fast on errors, unset variables, or failed pipeline commands. +# Fail fast on errors, unset variables, or failed pipeline commands. set -euo pipefail -# Test harness: locate this example directory when submitted from the repo root. -script_dir="${SLURM_SUBMIT_DIR:-$PWD}" -if [[ -d "$script_dir/seqtk/single-node" ]]; then - script_dir="$script_dir/seqtk/single-node" -fi - -# Test harness: create a clean per-job output directory so runs do not collide. -run_dir="$script_dir/outputs/${SLURM_JOB_NAME}-${SLURM_JOB_ID}" +# Create a clean per-job output directory inside the submission directory. +submit_dir="${SLURM_SUBMIT_DIR:-$PWD}" +run_dir="$submit_dir/outputs/${SLURM_JOB_NAME}-${SLURM_JOB_ID}" rm -rf "$run_dir" mkdir -p "$run_dir" cd "$run_dir" -# Fundamental: load SeqTK. +# Load SeqTK. module --ignore-cache load seqtk/1.4-qhos -# Test harness: write a tiny FASTQ input. +# Write a tiny FASTQ input. cat > reads.fq <<'EOF' @read1 ACGTACGT @@ -58,11 +49,12 @@ TTTTCCCC HHHHHHHH EOF -# Fundamental: convert FASTQ to FASTA. +# Convert FASTQ to FASTA. seqtk seq -A reads.fq > reads.fa -# Test check: confirm both reads are present in FASTA output. + +# Confirm both reads are present in FASTA output. test "$(grep -c '^>' reads.fa)" -eq 2 -# Test check: confirm read1 appears as a FASTA header. +# Confirm read1 appears as a FASTA header. grep -q ">read1" reads.fa ``` @@ -70,17 +62,14 @@ The important Slurm resource lines are the `#SBATCH` directives near the top of ## Example output -The following abbreviated result is from the Easley debug regression run used to validate this example. +After the job finishes, Slurm should report a completed job with exit code `0:0`. The job directory under `outputs/` should contain the original `reads.fq` file and the converted `reads.fa` file. ```text -Script: seqtk/single-node/slurm-test.sh -Job ID: 806485 Slurm state: COMPLETED Exit code: 0:0 -Elapsed time: 00:00:01 Allocated nodes: 1 Allocated CPUs: 1 -Result: pass +Expected files: reads.fq, reads.fa ``` -For a successful run, the Slurm state should be `COMPLETED`, the exit code should be `0:0`, and any application-specific checks in the script should pass. +For a successful run, the Slurm state should be `COMPLETED`, the exit code should be `0:0`, and the checks in the script should pass. From 509d2530c36cf98ceff67d633cf4fa71e077d1ae Mon Sep 17 00:00:00 2001 From: Maren Date: Mon, 29 Jun 2026 15:43:29 -0600 Subject: [PATCH 3/5] Updated test_seqtk_quickbyte.md to have clearer instuctions Clarified instructions to be clear. --- test_seqtk_quickbyte.md | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/test_seqtk_quickbyte.md b/test_seqtk_quickbyte.md index 7ddeb37..8cf653f 100644 --- a/test_seqtk_quickbyte.md +++ b/test_seqtk_quickbyte.md @@ -6,7 +6,13 @@ SeqTK is a lightweight command-line toolkit for working with FASTA and FASTQ seq ## Example Slurm Script -Save the following as `slurm-test.sh` in the example directory and submit it with `sbatch slurm-test.sh`. +First, log in to easley via SSH. + +`ssh user@easley.alliance.unm.edu` + +Next, navigate to the directory where you would like to work by running `cd `. If you are following along with the tutorial then use the `example` directory. If you do not have an example directory then you can make one with `mkdir example`, then navigate inside the directory. + +Then, create the script in that directory. To do this we will use a text editor. You are able to use whatever editor you prefer; however, this tutorial will use nano. Run `nano slurm-test.sh` to create the file. Then, copy the following text and paste it into the file by right-clicking in the terminal (or by using your terminal's paste shortcut). ```bash #!/bin/bash -l @@ -57,12 +63,25 @@ test "$(grep -c '^>' reads.fa)" -eq 2 # Confirm read1 appears as a FASTA header. grep -q ">read1" reads.fa ``` - The important Slurm resource lines are the `#SBATCH` directives near the top of the script. They request the debug partition, a small amount of time, and the CPU, memory, node, or GPU resources needed by this smoke test. The `module load` commands prepare the software environment, and `srun` is used when the application should be launched through Slurm across allocated tasks. +Save the file using `Ctrl + X`, then type `y` when prompted. Then, in the terminal use `sbatch slurm-test.sh` to submit the script. + ## Example output -After the job finishes, Slurm should report a completed job with exit code `0:0`. The job directory under `outputs/` should contain the original `reads.fq` file and the converted `reads.fa` file. +After the job finishes, Slurm should report a completed job with exit code `0:0`. To check this use `squeue` for running jobs and `sacct -j ` for completed jobs. + +The `outputs` directory within your `example` directory will contain another directory. This directory has the original `reads.fq` file and the converted `reads.fa` file in it. + +The `reads.fa` file will contain (Use `cat reads.fa` to look at the file): + +```text +>read1 +ACGTACGT +>read2 +TTTTCCCC +``` +For a successful run, the Slurm state should be `COMPLETED`, the exit code should be `0:0`, and the checks in the script should pass. ```text Slurm state: COMPLETED @@ -72,4 +91,4 @@ Allocated CPUs: 1 Expected files: reads.fq, reads.fa ``` -For a successful run, the Slurm state should be `COMPLETED`, the exit code should be `0:0`, and the checks in the script should pass. +*This quickbyte was verified on 6/29/2026* From 698c92434f83a866e56a30d4e2eb36858d32f4f6 Mon Sep 17 00:00:00 2001 From: Maren Date: Tue, 30 Jun 2026 15:02:48 -0600 Subject: [PATCH 4/5] Update test_seqtk_quickbyte.md Fixed example directory confusion --- test_seqtk_quickbyte.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/test_seqtk_quickbyte.md b/test_seqtk_quickbyte.md index 8cf653f..9464e73 100644 --- a/test_seqtk_quickbyte.md +++ b/test_seqtk_quickbyte.md @@ -10,7 +10,7 @@ First, log in to easley via SSH. `ssh user@easley.alliance.unm.edu` -Next, navigate to the directory where you would like to work by running `cd `. If you are following along with the tutorial then use the `example` directory. If you do not have an example directory then you can make one with `mkdir example`, then navigate inside the directory. +Next, navigate to the directory where you would like to work by running `cd `. If you are following along with the QuickByte and you would like to use a sepreate directory, then you can make one with `mkdir seqtk_example`, then navigate inside the directory. Then, create the script in that directory. To do this we will use a text editor. You are able to use whatever editor you prefer; however, this tutorial will use nano. Run `nano slurm-test.sh` to create the file. Then, copy the following text and paste it into the file by right-clicking in the terminal (or by using your terminal's paste shortcut). @@ -63,7 +63,7 @@ test "$(grep -c '^>' reads.fa)" -eq 2 # Confirm read1 appears as a FASTA header. grep -q ">read1" reads.fa ``` -The important Slurm resource lines are the `#SBATCH` directives near the top of the script. They request the debug partition, a small amount of time, and the CPU, memory, node, or GPU resources needed by this smoke test. The `module load` commands prepare the software environment, and `srun` is used when the application should be launched through Slurm across allocated tasks. +The important Slurm resource lines are the `#SBATCH` directives near the top of the script. They request the debug partition, a small amount of time, and the CPU, memory, node, or GPU resources needed by this smoke test. The `module load` commands prepare the software environment. Save the file using `Ctrl + X`, then type `y` when prompted. Then, in the terminal use `sbatch slurm-test.sh` to submit the script. @@ -71,7 +71,7 @@ Save the file using `Ctrl + X`, then type `y` when prompted. Then, in the termin After the job finishes, Slurm should report a completed job with exit code `0:0`. To check this use `squeue` for running jobs and `sacct -j ` for completed jobs. -The `outputs` directory within your `example` directory will contain another directory. This directory has the original `reads.fq` file and the converted `reads.fa` file in it. +The `outputs` directory within your `seqtk_example` directory will contain another directory. This directory has the original `reads.fq` file and the converted `reads.fa` file in it. The `reads.fa` file will contain (Use `cat reads.fa` to look at the file): @@ -91,4 +91,4 @@ Allocated CPUs: 1 Expected files: reads.fq, reads.fa ``` -*This quickbyte was verified on 6/29/2026* +*This quickbyte was verified on 6/30/2026* From f948512ba2d759968f002b7004ae8f2aa51eb516 Mon Sep 17 00:00:00 2001 From: Maren Date: Thu, 2 Jul 2026 13:42:06 -0600 Subject: [PATCH 5/5] Update test_seqtk_quickbyte.md Made some small changes for clarification and grammar. --- test_seqtk_quickbyte.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/test_seqtk_quickbyte.md b/test_seqtk_quickbyte.md index 9464e73..dd7a5f6 100644 --- a/test_seqtk_quickbyte.md +++ b/test_seqtk_quickbyte.md @@ -12,7 +12,7 @@ First, log in to easley via SSH. Next, navigate to the directory where you would like to work by running `cd `. If you are following along with the QuickByte and you would like to use a sepreate directory, then you can make one with `mkdir seqtk_example`, then navigate inside the directory. -Then, create the script in that directory. To do this we will use a text editor. You are able to use whatever editor you prefer; however, this tutorial will use nano. Run `nano slurm-test.sh` to create the file. Then, copy the following text and paste it into the file by right-clicking in the terminal (or by using your terminal's paste shortcut). +Create the script in that directory. To do this we will use a text editor. You are able to use whatever editor you prefer; however, this QuickByte will use nano. Run `nano slurm-test.sh` to create the file. Then, copy the following text and paste it into the file by right-clicking in the terminal (or by using your terminal's paste shortcut). ```bash #!/bin/bash -l @@ -63,19 +63,20 @@ test "$(grep -c '^>' reads.fa)" -eq 2 # Confirm read1 appears as a FASTA header. grep -q ">read1" reads.fa ``` -The important Slurm resource lines are the `#SBATCH` directives near the top of the script. They request the debug partition, a small amount of time, and the CPU, memory, node, or GPU resources needed by this smoke test. The `module load` commands prepare the software environment. +The important Slurm resource lines are the `#SBATCH` directives near the top of the script. They request the debug partition, a small amount of time, and the CPU, memory, node, or GPU resources needed by this smoke test. The `module load` commands prepare the software environment. The script writes a FASTQ input and converts that input into a FASTA output. Save the file using `Ctrl + X`, then type `y` when prompted. Then, in the terminal use `sbatch slurm-test.sh` to submit the script. ## Example output -After the job finishes, Slurm should report a completed job with exit code `0:0`. To check this use `squeue` for running jobs and `sacct -j ` for completed jobs. +After the job finishes, Slurm should report a completed job with exit code `0:0`. To check this use `squeue` for running jobs and `sacct -j ` for completed jobs. The job ID is the number slurm assigns to job when you submit it. The `outputs` directory within your `seqtk_example` directory will contain another directory. This directory has the original `reads.fq` file and the converted `reads.fa` file in it. The `reads.fa` file will contain (Use `cat reads.fa` to look at the file): ```text +cat reads.fa >read1 ACGTACGT >read2