Overview

The module enables users to assemble Oxford Nanopore Technologies (ONT) FASTQ-files. The module consists of the following tools:

  • Trimming
    • chopper: Applies a headcrop (trim start of read) and tailcrop (end of read). Filtering is done on average read quality and minimal or maximal read length (by default turned on with quality 10 and minimum length 500).
  • Subsampling (and filtering)
    • rasusa: Randomly subsamples, in contrast to filtlong, reads of different lengths to a specified coverage. Citation: M.B. Hall (2022). Rasusa: Randomly subsample sequencing reads to a specified coverage (by default turned on with coverage 100).
    • filtlong: Filters long reads by quality (longer is better) and subsamples (by default with coverage 100). Might be beneficial if subsampling is applied with RBK and especially RPBK data.
  • De novo assembly
    • Raven: Overlap-layout-consensus assembler which accelerates the overlap step, builds an assembly graph from reads that were pre-processed with pile-o-grams, and polishes the unambiguous graph paths with Racon. Does not correct the raw reads. States circularity and assembled coverage of contigs, is deterministic, and includes a Racon polishing step (default ONT assembler).
    • Flye: Uses a repeat graph as the core data structure. Compared to de Bruijn graphs, which require exact k-mer matches, repeat graphs are built using approximate sequence matches and thereby can tolerate higher noise of reads. Does not correct the raw reads (in contrast to the canu assembler). States circularity and assembled coverage of contigs (runs with --nano-hq command by default) and is not fully deterministic, i.e., if the same dataset is re-analyzed not always the exact same results are obtained.


  • Polishing
    • medaka: Creates consensus sequences from nanopore sequencing data. This task is performed using neural networks applied to a pileup of individual sequencing reads against a draft assembly. Corrects only the FASTA consensus and not the FASTQ raw reads. If rasusa or filtlong was applied, medaka uses the subsampled reads only (by default turned on with model r1041_e82_400bps_sup_v4.3.0 for dorado v5.0.2 basecaller (dna_r10.4.1_e8.2_400bps_sup@v4.3.0).

ONTAssemblyModule.png

Hybrid assemblies are currently not supported. For further information see our long-read de novo assembler evaluation and ONT native (NBK), rapid (RBK), and rapid PCR (RPBK) barcoding kits contiguities and accuracies evaluation.

Requirements

The ONT Data Assembly Module (beta) is part of the extra charged Long-read Data Analysis Bundle [LDAB].

Button16 Important.png Important: