Releases: nanoporetech/medaka
Releases · nanoporetech/medaka
v2.2.1
Added
- tandem: add CRAM input support with optional reference FASTA for alignment reading.
- tandem: support gzipped BED region inputs.
Changed
- Stopped building
medaka-cpupypi packages. - Upgraded
pyabpoato version 1.5.6. - tandem: fail fast on invalid inputs.
- Upgraded bundled
HTSlibfrom 1.14 to 1.20. - Updated build environment for arm wheels from
manylinux2014tomanylinux_2_28.
Fixed
- The newly added
-pprefix option tomedaka_consensuswas not correctly recognized. - Save optimiser state in corresponding training folder.
- Training with
--modelpointing to a model archive now correctly loads the weights
for the model rather than just creating a model with the same parameters but random
weights. - Move some inference options into correct parser section.
- Corrected learning-rate schedule for
samples_per_training_epoch. - Turn off scheduler update during validation.
- Fix command-line parsing so schedule is off by default.
- Prevent empty columns from potentially appearing in read-level features by allowing
the insertion of reads whose alignment started while there was no available rows in
the feature tensor.
v2.2.0
Added
- Python 3.13 support.
medaka_consensusnow has a-p <prefix>option to set the prefix for
output consensus file (rather than always beingconsensus).
Fixed
- Issue where samtools version detection would stumble on meta-information
lines in some builds of samtools.
Removed
- Python 3.9 support (EOL).
Changed
medaka featuresfails if encoder expects dwells but data does not contain
move tables, or warns if available move tables are not being used.- During training, warning is output if model is not using available dwell
information. - Make more of an attempt to set threading appropriately for differing
behaviours of pytorch and oneDNN builds. - Replace read level features 3-dimensional tensor with structured numpy array
for clarity and easier extension.
v2.1.1
Added
- Python 3.12 support
Fixed
- Issue with checking for the presence of dwell information in fastq files.
Changed
- Behaviour of
medaka_consensuswith--bacteriaoption: if the basecaller model cannot
be parsed or is not compatible with the bacterial polishing model, exit with an error instead of
falling back to default model. - Replaced
pkg_resourceswithimportlib
Removed
- Python 3.8 support (EOL).
v2.1.0
Fixed
- Updated documentation with
inferenceandsequencecommand renaming. - Changed default model resolved from bam file from
varianttoconsensus. - Fixed issue with initializing
inferencein Medaka tandem model. - Fixed a memory leak in the Medaka C library and removed redundant memory objects to reduce the footprint.
Changed
- Fully refactored and redesigned
medaka tandemcode and optimised CPU-based execution. - Read-level models cannot be used with
medaka tandem. - get_trimmed_reads now also returns the phase-set, hap and read ids.
Added
- Consensus models for v5.2.0 basecaller models.
- Added support for read-level consensus models for v5.0.0 and v5.2.0 basecaller models.
- Models
dna_r10.4.1_e8.2_5khz_400bps_supanddna_r10.4.1_e8.2_5khz_400bps_hacadded
as aliases to those without_5kz_in their names. - Added
-Boption tomedaka_consensusto allow passing a bed file or region to polish
viamedaka inference --regions. - Added
--cpuoption tomedaka inferenceto force CPU and avoid searching for GPUs. - New output format for
medaka tandemtailored for population studies. - New fields to
medaka tandemoutput: depth, read lengths, read names, phase sets, and MAD of read lengths. - Read length–based outlier detection in
medaka tandem.
v2.0.1
Fixed
medaka smoleculewas broken by change frommedaka consensustomedaka inference.
Changed
- Improved error message when model is not found.
v2.0.0
Switched from tensorflow to pytorch.
Existing models for recent basecallers have been converted to the new format.
Pytorch format models contain a _pt suffix in the filename.
Changed
- Inference is now performed using PyTorch instead of TensorFlow.
- The
medaka consensuscommand has been renamed tomedaka inferenceto reflect
its function in running an arbitrary model and avoid confusion withmedaka_consensus. - The
medaka stitchcommand has been renamed tomedaka sequenceto reflect its
function in creating a consensus sequence. - The
medaka variantcommand has been renamed tomedaka vcfto reflect its function
in consolidating variants and avoid confusion withmedaka_variant. - Order of arguments to
medaka vcfhas been changed to be more consistent
withmedaka sequence. - The helper script
medaka_haploid_varianthas been renamedmedaka_variantto
save typing. - Make
--ignore_read_groupsoption available to more medaka subcommands includinginference.
Removed
- The
medaka snpcommand has been removed. This was long defunct as diploid SNP calling
had been deprecated, andmedaka variantis used to create VCFs for current models. - Loading models in hdf format has been deprecated.
- Deleted minimap2 and racon wrappers in
medaka/wrapper.py.
Added
- Release conda packages for Linux (x86 and aarch64) and macOS (arm64).
- Option
--lr_scheduleallows using cosine learning rate schedule in training. - Option
--max_valid_samplesto set number of samples in a training validation batch.
Fixed
- Training models with DiploidLabelScheme uses categorical cross-entropy loss
instead of binary cross-entropy.
v1.12.1
(Probably) final version of medaka using tensorflow. Future versions will use
pytorch instead.
Fixed
- medaka_consensus: only keep bam tags if input file matches joint polishing pipeline.
- Pin numpy to <2.0.0.
Added
- Consensus and variant models lookup for v3.5.1 Dorado models.
v1.12.0
Fixed
- tandem: Use haplotag 0 in unphased mode.
- tandem: Don't run consensus if regions set is empty.
Added
- Models for version 5 basecaller models.
- Expose
sym_indelsoption for training. - Expose
--min_mapqminimum mapping quality alignment fitering option for medaka consensus. - tandem: Option
--ignore_read_groupsto ignore read groups present in input file. - Wrapper script
medaka_consensus_jointand convenience tools (prepare_tagged_bam,
get_model_dtypes) to facilitate joint polishing with multiple datatypes.
v1.11.3
Added
- Consensus and variant models for v4.3.0 dorado models.
v1.11.2
Added
- Parsing model information from fastq headers output by Guppy and MinKNOW.
Changed
- Additional explanatory information in VCF INFO fields concerning depth calculations.