Supported Data Files

Long read data (PacBio and Oxford Nanopore Technologies) data can be processed as FASTA files. Two predefined Procedure Details can be used when importing files manually or in a pipeline. Currently the pipelines (e.g., SMRT Link for PacBio or Chopper, Flye, and Medaka for ONT) of the vendors must be used for base-calling, assembly, polishing, and circularization.

Optional the topology and coverage information can be extracted from the contigs of the FASTA files if stated as following:

  • the first row of each contig should contain the term topology=circular if they contain a complete circular plasmid or chromosome. This term is used to define circular plasmids for Chromosome and Plasmids Overview Task Template processing. Knowing if a contig is circular might improve the MOB-suite plasmid reconstruction process.
  • the first row of each contig may also contain a term like coverage=123 to specify the coverage value of this contig. Contigs with coverage=0 are excluded from coverage calculation. All contigs in the FASTA file must have a coverage information, else no average assembled coverage is calculated. Knowing the average coverage helps for QC.

Thereby, also a NCBI conform naming of the contigs can be achieved; e.g., for a circular chromosome:
>contig1_1710375900 [topology=circular][completeness=complete][chromosome];5261576;coverage=29

Using in the pipeline a tool like Circlator to fixstart (and orientation) helps tremendously for downstream visualization and comparisons of chromosomes and plasmids. For chromosomes Circlator uses for this function by default matches to the dnaA gene. For defining the start and orientation of most plasmids the CGE PlasmidFinder replicon database that is used for rep-typing could be utilized.

Importing Run Info

PacBio run infos can be imported.