Genome Assembly for Abudefduf

This repository hold scripts and select results used to assemble and analyse the genomes of Abudefduf saxatilis and Abudefduf troschelii for the paper "Comparative Genomics of Trans-isthmian Damselfishes (Abudefduf saxatilis and A.troschelii)" by Tracy et al., 2026 (accepted for publication in GBE).

Overall this repository includes information on the de-novo assemblies for PacBio genomes, and conduct subsequent comparative analyses such as synteny, demographic history, and gene family expansion.

Easley job submission

All scripts were run using Auburn University's Easley Supercomputer using slurm. As such, all scripts have a similar setup at the top with SBATCH parameters that will need to be modified for your own use. I have removed my own job names, partition names, and user names and indicated where you will need to enter your own. I have left the memory, ntasks, and time allotted for the run to help others who may be using the same supercomputer for these analyses. Most of these parameters are likely overkill, however these are what I used and they worked for me.

Below is an example of what the SBATCH component of each script looks like. THe memory, ntasks, and time parameters are different for each script in this repository.

#!/bin/bash
#SBATCH --job-name=ENTERJOBNAME             # job name
#SBATCH --mem=180G
#SBATCH --ntasks=30                  # number of tasks across all nodes
#SBATCH --partition=ENTERPARTITIONNAME          # name of partition to submit job
#SBATCH --time=48:00:00              # Run time (D-HH:MM:SS)
#SBATCH --output=job-%j.out          # Output file. %j is replaced with job ID
#SBATCH --error=job-%j.err           # Error file. %j is replaced with job ID
#SBATCH --mail-type=ALL              # will send email for begin,end,fail
#SBATCH --mail-user=ENTERUSERNAME@auburn.edu

Convert File Format

if not already in fasta format run bamtofastx.sh script

source activate genome_env2


bam2fasta -o Outfilename Infilename.bam 

conda deactivate

De-novo genome assembly

run hifiasm.sh script

module load hifiasm/0.15.1


hifiasm -o atro_assembly.asm -t 30 /home/cbt0022/A-troschelii-genome/files.rc.byu.edu/owen_fish/17G/ccs/m64140_211120_040313.hifi_reads.fasta.gz

download .gfa files to visualize in bandage on local computer

need to change path and filenames

scp cbt0022@easley.auburn.edu:~/A-sax-genome/analysis/asax_assembly.asm.bp.p_ctg.gfa . 
scp cbt0022@easley.auburn.edu:~/A-sax-genome/analysis/asax_assembly.asm.bp.hap2.p_ctg.gfa .
scp cbt0022@easley.auburn.edu:~/A-sax-genome/analysis/asax_assembly.asm.bp.hap1.p_ctg.gfa .

get primary contigs in fasta from hifiasm output --- really not sure where this came from or what this does need to change file names

awk '/^S/{print ">"$2;print $3}' asax_assembly.asm.bp.p_ctg.gfa > asax_assembly.asm.p_ctg.fa  # get primary contigs in FASTA

convert gfa to fasta (on desktop)

 ./gfatools gfa2fa ../asax_assembly.asm.bp.p_ctg.gfa > Asax_assembly.fa

pyfastx to get stats (on desktop)

pyfastx stat Asax_assembly.fa > stats.txt

or on Easley

pyfastx info atro_assembly.asm.bp.p_ctg.fa > Atro_stats.txt

Map Raw Reads back to Assembled Genome

#!/bin/bash
#SBATCH --job-name=samtools            # job name
#SBATCH --ntasks=1                # number of tasks across all nodes
#SBATCH --mem=80G
#SBATCH --partition=investor_bg4           # name of partition to submit job
#SBATCH --time=30:00:00              # Run time (D-HH:MM:SS)
#SBATCH --output=job-%j.out          # Output file. %j is replaced with job ID
#SBATCH --error=job-%j.err           # Error file. %j is replaced with job ID
#SBATCH --mail-type=FAIL              # will send email for begin,end,fail
#SBATCH --mail-user=cbt0022@auburn.edu

#module load python/anaconda/
#module load bwa 
module load samtools

#source activate blobtools2
#module load mpich/3.2

#index genome assembly
bwa index -p ASAX  ~/A-sax-genome/analysis/asax_assembly.asm.p_ctg.fa


##run bwa

#mpiexec -np 40 
bwa mem -t 40 ASAX ~/A-sax-genome/analysis/Abudefduf_genome.fasta.gz > Asax_bwa_aln.sam
##convert to bam and sort with samtools
samtools view -bu Asax_bwa_aln.sam | samtools sort -T Asax -O bam -l 9 -m 30G -o Asax_bwa_aln_sorted.bam


#conda deactivate

Decontamination

Decontaminate genome using blobtools on Auburn University's Easley Cluster.

Input data for blobtools:
-i /home/cbt0022/A-sax-genome/analysis/asax_assembly.asm.p_ctg.fa #This is the assembled genome output from hifiasm
-t blast.out
-b Asax_bwa_aln_sorted.bam #Raw reads mapped to genome -o A-sax_ref_blob #OUTPUT

See README.md in Decontamination folder for code used and results.

Assembly with reference genome

Reference genome to use: Nemo_v1.1 GenBank: GCA_003047355.2

only chromosome level assembly in Pomacentridae

download reference genome to easley:

wget --recursive --no-host-directories --cut-dirs=6 ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/047/355/GCA_003047355.2_Nemo_v1.1/ -P reference/

run this on command line:

module load python/anaconda
export CONDA_PKGS_DIRS=~/.conda/pkgs
#already created the genome_env3 environment
#conda create -n genome_env_minimap
source activate genome_env_minimap
conda install -c bioconda minimap2
conda deactivate

minimap.sh script:

#!/bin/bash
#SBATCH --job-name=minimap_abu             # job name
#SBATCH --ntasks=30                  # number of tasks across all nodes
#SBATCH --mem=120G
#SBATCH --partition=general          # name of partition to submit job
#SBATCH --time=48:00:00              # Run time (D-HH:MM:SS)
#SBATCH --output=job-%j.out          # Output file. %j is replaced with job ID
#SBATCH --error=job-%j.err           # Error file. %j is replaced with job ID
#SBATCH --mail-type=ALL              # will send email for begin,end,fail
#SBATCH --mail-user=cbt0022@auburn.edu


source activate genome_env_minimap

minimap2 -ax map-pb  ~/reference/GCA_003047355.2_Nemo_v1.1/Nemo_v1.1_genomic.fa.gz ~/A-sax-genome/analysis/asax_assembly.asm.p_ctg.fa > aln.sam   # for PacBio CLR reads

sbatch minimap.sh

Name		Name	Last commit message	Last commit date
Latest commit History 112 Commits
CAFE		CAFE
Decontamination		Decontamination
GO_MWU		GO_MWU
PSMC		PSMC
CAFE.md		CAFE.md
README.md		README.md
bam2fastx.sh		bam2fastx.sh
hifiasm.sh		hifiasm.sh
psmc.md		psmc.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genome Assembly for Abudefduf

Table of Contents

Easley job submission

Convert File Format

De-novo genome assembly

Map Raw Reads back to Assembled Genome

Decontamination

Assembly with reference genome

BUSCO

Genome Annotation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Genome Assembly for Abudefduf

Table of Contents

Easley job submission

Convert File Format

De-novo genome assembly

Map Raw Reads back to Assembled Genome

Decontamination

Assembly with reference genome

BUSCO

Genome Annotation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages