Skip to content

Releases: nanoporetech/medaka

v2.2.1

09 Apr 09:01

Choose a tag to compare

Added

  • tandem: add CRAM input support with optional reference FASTA for alignment reading.
  • tandem: support gzipped BED region inputs.

Changed

  • Stopped building medaka-cpu pypi packages.
  • Upgraded pyabpoa to version 1.5.6.
  • tandem: fail fast on invalid inputs.
  • Upgraded bundled HTSlib from 1.14 to 1.20.
  • Updated build environment for arm wheels from manylinux2014 to manylinux_2_28.

Fixed

  • The newly added -p prefix option to medaka_consensus was not correctly recognized.
  • Save optimiser state in corresponding training folder.
  • Training with --model pointing to a model archive now correctly loads the weights
    for the model rather than just creating a model with the same parameters but random
    weights.
  • Move some inference options into correct parser section.
  • Corrected learning-rate schedule for samples_per_training_epoch.
  • Turn off scheduler update during validation.
  • Fix command-line parsing so schedule is off by default.
  • Prevent empty columns from potentially appearing in read-level features by allowing
    the insertion of reads whose alignment started while there was no available rows in
    the feature tensor.

v2.2.0

08 Dec 15:30

Choose a tag to compare

Added

  • Python 3.13 support.
  • medaka_consensus now has a -p <prefix> option to set the prefix for
    output consensus file (rather than always being consensus).

Fixed

  • Issue where samtools version detection would stumble on meta-information
    lines in some builds of samtools.

Removed

  • Python 3.9 support (EOL).

Changed

  • medaka features fails if encoder expects dwells but data does not contain
    move tables, or warns if available move tables are not being used.
  • During training, warning is output if model is not using available dwell
    information.
  • Make more of an attempt to set threading appropriately for differing
    behaviours of pytorch and oneDNN builds.
  • Replace read level features 3-dimensional tensor with structured numpy array
    for clarity and easier extension.

v2.1.1

19 Aug 14:03

Choose a tag to compare

Added

  • Python 3.12 support

Fixed

  • Issue with checking for the presence of dwell information in fastq files.

Changed

  • Behaviour of medaka_consensus with --bacteria option: if the basecaller model cannot
    be parsed or is not compatible with the bacterial polishing model, exit with an error instead of
    falling back to default model.
  • Replaced pkg_resources with importlib

Removed

  • Python 3.8 support (EOL).

v2.1.0

23 May 08:25

Choose a tag to compare

Fixed

  • Updated documentation with inference and sequence command renaming.
  • Changed default model resolved from bam file from variant to consensus.
  • Fixed issue with initializing inference in Medaka tandem model.
  • Fixed a memory leak in the Medaka C library and removed redundant memory objects to reduce the footprint.

Changed

  • Fully refactored and redesigned medaka tandem code and optimised CPU-based execution.
  • Read-level models cannot be used with medaka tandem.
  • get_trimmed_reads now also returns the phase-set, hap and read ids.

Added

  • Consensus models for v5.2.0 basecaller models.
  • Added support for read-level consensus models for v5.0.0 and v5.2.0 basecaller models.
  • Models dna_r10.4.1_e8.2_5khz_400bps_sup and dna_r10.4.1_e8.2_5khz_400bps_hac added
    as aliases to those without _5kz_ in their names.
  • Added -B option to medaka_consensus to allow passing a bed file or region to polish
    via medaka inference --regions.
  • Added --cpu option to medaka inference to force CPU and avoid searching for GPUs.
  • New output format for medaka tandem tailored for population studies.
  • New fields to medaka tandem output: depth, read lengths, read names, phase sets, and MAD of read lengths.
  • Read length–based outlier detection in medaka tandem.

v2.0.1

11 Oct 08:57

Choose a tag to compare

Fixed

  • medaka smolecule was broken by change from medaka consensus to medaka inference.

Changed

  • Improved error message when model is not found.

v2.0.0

11 Sep 13:55

Choose a tag to compare

Switched from tensorflow to pytorch.

Existing models for recent basecallers have been converted to the new format.
Pytorch format models contain a _pt suffix in the filename.

Changed

  • Inference is now performed using PyTorch instead of TensorFlow.
  • The medaka consensus command has been renamed to medaka inference to reflect
    its function in running an arbitrary model and avoid confusion with medaka_consensus.
  • The medaka stitch command has been renamed to medaka sequence to reflect its
    function in creating a consensus sequence.
  • The medaka variant command has been renamed to medaka vcf to reflect its function
    in consolidating variants and avoid confusion with medaka_variant.
  • Order of arguments to medaka vcf has been changed to be more consistent
    with medaka sequence.
  • The helper script medaka_haploid_variant has been renamed medaka_variant to
    save typing.
  • Make --ignore_read_groups option available to more medaka subcommands including inference.

Removed

  • The medaka snp command has been removed. This was long defunct as diploid SNP calling
    had been deprecated, and medaka variant is used to create VCFs for current models.
  • Loading models in hdf format has been deprecated.
  • Deleted minimap2 and racon wrappers in medaka/wrapper.py.

Added

  • Release conda packages for Linux (x86 and aarch64) and macOS (arm64).
  • Option --lr_schedule allows using cosine learning rate schedule in training.
  • Option --max_valid_samples to set number of samples in a training validation batch.

Fixed

  • Training models with DiploidLabelScheme uses categorical cross-entropy loss
    instead of binary cross-entropy.

v1.12.1

12 Jul 13:56

Choose a tag to compare

(Probably) final version of medaka using tensorflow. Future versions will use
pytorch instead.

Fixed

  • medaka_consensus: only keep bam tags if input file matches joint polishing pipeline.
  • Pin numpy to <2.0.0.

Added

  • Consensus and variant models lookup for v3.5.1 Dorado models.

v1.12.0

20 May 10:04

Choose a tag to compare

Fixed

  • tandem: Use haplotag 0 in unphased mode.
  • tandem: Don't run consensus if regions set is empty.

Added

  • Models for version 5 basecaller models.
  • Expose sym_indels option for training.
  • Expose --min_mapq minimum mapping quality alignment fitering option for medaka consensus.
  • tandem: Option --ignore_read_groups to ignore read groups present in input file.
  • Wrapper script medaka_consensus_joint and convenience tools (prepare_tagged_bam,
    get_model_dtypes) to facilitate joint polishing with multiple datatypes.

v1.11.3

06 Dec 14:28

Choose a tag to compare

Added

  • Consensus and variant models for v4.3.0 dorado models.

v1.11.2

29 Nov 22:14

Choose a tag to compare

Added

  • Parsing model information from fastq headers output by Guppy and MinKNOW.

Changed

  • Additional explanatory information in VCF INFO fields concerning depth calculations.