π πͺ° π πͺ²
- Download data
- Subset samples
- Calculate habitat metrics per trap
- Fetch weather information per catch lot
- Visualise which catch lots have not been sequenced yet
- Calculate Shannon and Simpson indices per catch lot
/lustre/scratch126/tol/teams/lawniczak/projects/bioscan/processing/required_files
β’ collection_time_codes.csv (time codes translated to numeric values)
β’ mozz_to_partner.csv (older mozz plates translated to partner codes)
β’ trap_to_partner.csv (catch lot location translated to trap name)
& /lustre/scratch126/tol/teams/lawniczak/projects/bioscan/bioscan_qc/mbrave_batch_data
This directory stores all mBRAVE files - information of all samples that have been sequenced.
& /lustre/scratch126/tol/teams/lawniczak/projects/bioscan/processing/maps
This directory contains all CEH (land cover) and MET office (weather) maps as well as general files required to visualise UK maps.
These files must me manually updated when new MET or CEH data appears or when new partners join BIOSCAN.
/lustre/scratch126/tol/teams/lawniczak/projects/bioscan/100k_paper/code
/lustre/scratch126/tol/teams/lawniczak/projects/bioscan/100k_paper/output
&
/lustre/scratch126/tol/teams/lawniczak/projects/bioscan/100k_paper/plots
To run execute:
bsub < 00_manifest_fetch.shThe code will create sts_manifests_[date].tsv file which contains all samples that are present in manifests submitted by the partners (sequenced and not sequenced)
To run interactively connect to R studio and load the file:
module load HGI/softpack/users/aw43/BOLDconnectR_bioscan/2
module load HGI/softpack/users/aw43/aw43_bioscan_habitat_complexity-2/3
module load rstudio
rstudio start --cpus 10 -M 100000 --queue normal --home /lustre/scratch126/tol/teams/lawniczak/projects/bioscan/100k_paper/code --pwd /lustre/scratch126/tol/teams/lawniczak/projects/bioscan/100k_paper/code -g team222This can be done with all the R scripts in this repository. Remember to load in the specified softpack environments.
Or submit as a job:
bsub < 01_subset_data.shThis script takes the output of 00_manifest_fetch.sh and subsets the BIOSCAN data for further processing. It removes control samples and samples from all non-BIOSCAN partners.
NOTE: if any new non-BIOSCAN samples are being processed and go through the standard BIOSCAN QC, these partner codes must be added manually within the 01_subset_data.R script.
Currently included: "BGEP", "SNST", "JARO", "BGKU", "BSN", "BGEG", "OXHP", "POMS", "AYDI", "BGPT", "-BGE", "-TOL"
The script retaines samples caught using Malaise trap only and from years 2021 onwards.
Additional columns added to the data include: trap name, partner name, region, plate, day, month, year, 24h sampling selection, if any samples from a given plate / catch lot have been sequenced [TRUE/FALSE].
The script also corrects previously recognised mistakes in the coordinates of some traps. If these have been already corrected in the Portal, the script won't introduce any changes.
The script assesses completnes of the data using the mBRAVE input files to check if any BIOSCAN samples are missing the QC output information.
NOTE: BOLDconnectR library requires an API key within the bold.apikey() function that you can obtain from your BOLD account.
The output file: BIOSCAN_100k_samples_corrected[date].csv
Do not run this script interactively because it would be very very slow, instead submit as a job:
bsub < 02A_land_cover_maps.shThis script generates habitat-type summaries around BIOSCAN trap locations using 10m resolution CEH 2024 land cover maps.
NI and GB are processed separately due to the use of different CEH maps.
For each trap, the script:
β’ Loads trap coordinates from a flat file trap_to_partner.csv
β’ Loads land cover raster layers for Great Britain (GB) and Northern Ireland (NI)
β’ Creates circular buffers of multiple radii (25, 50, 100, 500, 1000 m) - these sizes are currently specified in the script, please edit if you require bigger sizes
β’ Extracts all raster pixels intersecting each buffer
β’ Converts raster class codes into habitat labels
β’ Calculates the number of pixels per trap per buffer
β’ Saves one output csv per buffer size, separately for GB and NI
The output files: [buffer size]_ buffer_[GBL or NIL]_ 2024_[date].csv
To run from farm submit:
bsub < 02B_land_cover_maps.shThis script loads the most recent csv files produced by 02A_land_cover_maps.R, extracts metadata and stores all datasets in a single rds file preparing them for downstream analyses.
The output files: 02B_working_sets_radius.rds
To run from farm submit:
bsub < 02C_land_cover_maps.shOr run via RStudio
This script calculates habitat diversity indices and ratios of each habitat type for each BIOSCAN trap. The script uses the 02B_working_sets_radius.rds file as input.
Habitat ratios are calculated as ratio = habitat_pixels / total_pixels_within_buffer for:
β’ Arable & horticulture
β’ Semi-natural grasslands
β’ Forest (broadleaf + coniferous)
β’ Urban & suburban & gardens
β’ Improved grasslands
β’ Coastal habitats
β’ Heather / mountain / bog complexes
β’ Freshwater
The script also returns:
β’ Number of unique habitat types per trap
β’ Dominant habitat type and its percentage
β’ Shannon diversity index (richness + evenness)
β’ Simpson diversity index (evenness / dominance)
Each metric is exported to the output directory as a standalone CSV summarising traps & buffers.
Generated plots include:
β’ Heatmaps showing spatial patterns of each habitat ratio across all traps
β’ Correlation plots among buffers for each metric
β’ ShannonβSimpson scatterplots per buffer radius
β’ Cross-radius correlations of Shannon diversity
β’ Summary plots ranking traps by mean and median diversity indices
(is this python??)
Add functional diversity here Success per catch lot is calculated here and returned in the output data frame
for all models remember to subset only 24h and catch lots with good performance - depending on the analysis also subset the months based on the 24h sampling distribution (winter excluded)
how to deal with the winter 24h+ samples??
higher-level comparisons