Samplesheet Validator

This tool is designed to validate NGS samplesheets prior to downstream processing by performing a series of checks.

It can be used as a standalone process but was designed for integration into automated workflows through instantiation of the SamplesheetCheck class, which records validation outcome in a boolean flag Attribute (self.errors) and errors in a dict (self.errors_dict).

Use case

The tool has been designed for:

Illumina sequencing runs with Samplesheets expected to end in "_SampleSheet.csv".
AVITI runs.

Expect run types include:

Panel based NGS testing
TSO500
Oncodeep
Archer
MSK

Please note this tool has been specifically designed for the Genome Informatics Service at Synnovis (including the use of the seglh-naming library) and therefore might require modifications for integration into alternative workflows.

Protocol

Samplesheet validation is carried out in a series of consecutive steps with any errors identified recorded in the log file as per the config file.

Checks:

Samplesheet path provided is valid.
Samplesheet matches expected naming:
- Illumina: checked againstseglh-naming library
- AVITI: samplesheet name matches run folder name.
The sequencer_id is in the allowed/validated list of sequencers for that run type.
The samplesheet is not empty (>10 bytes)
If the run is a development run. N.B. If the run is a dev run no further samplesheet validation is performed. Further checks are only carried out for clinical runs.
Samplesheet contains the minimum expected section headers
Content in columns "Sample_ID" and "Sample_Name" match for each sample in the samplesheet
Samplesheet doesn't contain any illegal characters
Sample name matches expected naming convention for all samples. Assessed against seglh-naming library.
The test code (pannumber) for each sample is in the list of expected test codes for the run type.
Whether any TSO samples have been included on the run - Sets Boolean Attribute to true
Whether any OKD samples are included on the run - Sets Boolean Attribute to true

Installation & Usage

From Python package

Clone a copy of the repository locally

git clone https://github.com/moka-guys/samplesheet_validator.git
cd in to the project root directory
Install from python package

python3 setup.py install

NB's: Requires setuptools to be installed; Use the --user flag or install into an virtualenv/pipenv if not installing globally.

Execute functionality from within a python script.

from samplesheet_validator.samplesheet_validator import SamplesheetCheck

sscheck_obj = SamplesheetCheck(
    samplesheet_path,  # str
    sequencer_ids,  # list
    panels,  # list
    tso_panels,  # list
    okd_panels, # list
    dev_pannos,  # list
    logdir,  # str
    illumina, # bool
    runname, # str
)
sscheck_obj.ss_checks()  # Carry out samplesheeet validation

print(sscheck_obj.errors_dict)  # View the dictionary of error messages

Command line

To use the validator from the command line set up an environment as below:

python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt

The script can then be executed as follows:

usage: Used to validate a samplesheet using the seglh-naming conventions

Given an input samplesheet, will validate the samplesheet using seglh-naming conventions and output a logfile

options:
  -h, --help            show this help message and exit
  -S SAMPLESHEET_PATH, --samplesheet_path SAMPLESHEET_PATH
                        Path to samplesheet requiring validation
  -SI SEQUENCER_IDS, --sequencer_ids SEQUENCER_IDS
                        Comma separated string of allowed sequencer IDS
  -P PANELS, --panels PANELS
                        Comma separated string of allowed panel numbers
  -T TSO_PANELS, --tso_panels TSO_PANELS
                        Comma separated string of tso panels
  -O OKD_PANELS, --okd_panels OKD_PANELS
                        Comma separated string of okd panels
  -D DEV_PANNOS, --dev_pannos DEV_PANNOS
                        Comma separated development pan numbers
  -L LOGDIR, --logdir LOGDIR
                        Directory to save the output logfile to
  -NSH NO_STREAM_HANDLER, --no_stream_handler NO_STRAM_HANDLER
                        Provide flag when we dont want a stream handler (prevents
                        duplication of log messages to terminal if using another
                        logging instance)
  -R RUN_FOLDER_NAME, --runname RUN_FOLDER_NAME
                        Str for processed folder name

Testing

This repository currently has 93% test coverage.

Test datasets are stored in /test/data. The script has a full test suite:

test_samplesheet_validator.py

See test/README.md for details about test cases.

These tests should be run before pushing any code to ensure all tests in the GitHub Actions workflow pass. These can be run as follows:

python3 -m pytest

N.B. Tests and test cases/files MUST be maintained and updated accordingly in conjunction with script development. This includes ensuring that the arguments passed to pytest in the pytest.ini file are kept up to date

Logging

Logging is performed by ss_logger. The directory to save the log file to is supplied as an argument. The output log file is named by the script as follows:

$LOGFILE_DIR/$RUNFOLDER_NAME_$TIMESTAMP_samplesheet_validator.log

The script also collects the error messages as it runs, which can be used by other scripts when this script is used as an import.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Samplesheet Validator

Use case

Protocol

Installation & Usage

From Python package

Command line

Testing

Logging

Developed by the Synnovis Genome Informatics Team

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.github/workflows		.github/workflows
samplesheet_validator		samplesheet_validator
test		test
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pytest.ini		pytest.ini
requirements.txt		requirements.txt
settings.json		settings.json
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Samplesheet Validator

Use case

Protocol

Installation & Usage

From Python package

Command line

Testing

Logging

Developed by the Synnovis Genome Informatics Team

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages