Version 7.0

Major Changes

New virulence profiling task templates for 54 species with medical importance based on the Virulence Factor Database (VFDB).

New task template for all currently supported species except MtbC for antimicrobial resistance gene finding based on the NCBI AMRFinderPlus tool and database (requires Linux or Windows Subsystem for Linux on Windows 10 / Windows Server 2019 [WSL]).

Reference mapping pipelines use by default in new scripts the 3-4 times faster BWA-MEM algorithm instead of BWA-SW. Both algorithms produce rarely small allelic differences (1-2 alleles) and occasionally a slightly different number of found targets (requires Linux or WSL).

By default improved speed for larger seed genomes (>= 6 Mbases; no public cgMLST scheme at the time-point of this release has a seed genome of this size) in target scanning step. External BLAST+ version is used instead of our internal BLAST tool for those genomes to improve speed. The effect is especially pronounced on computers with more than 4 cores or genomes larger than 7 Mbases. BLAST+ and BLAST will not produce any allelic differences but in some samples the two algorithms can produce a slightly different number of found targets.

Shortcut function in comparison table creation dialog for storing selected fields as project, general, or system defaults for metadata fields shown in comparison table and exported fields of samples.

Support for Illumina adapter trimming with Trimmomatic directly in pipelines.

Support for FastANI via tools menu for fast computation of whole-genome ANI (Average Nucleotide Identity) to estimate the similarity to a reference genomes and to define species boundaries (requires Linux or WSL).

GC content change filter and column in Group Specific SNP result table for better support of high-resolution melting curve (HRMC) assays.

QC Improvements

Remapping of reads by default with BWA and consensus calling as polishing step for SKESA and SPAdes de novo assemblies. As SPAdes is known to create inaccurate alignments for data with low coverage by default a minimum base coverage of 5 is required to call a non-ambiguous base. For SKESA no minimum coverage is applied by default as it already uses more conservative heuristics than SPAdes. For more details it is referred to our updated de novo assembler evaluation documentation.

Consensus caller for BWA reference mapping in new pipelines scripts now requires by default a minimum base coverage of 5 to call a non-ambiguous base, if the estimated average coverage (unassembled) is below 50. If the average coverage is above 50 no minimum coverage is applied.

New default Start/Stop codon check in target QC of all cgMLST and Accessory task templates. If enabled (check cannot be disabled in public schemes) the check fails if no start codon is found at the begin, no stop codon is found at the end, or a stop codon is found at a wrong position in a target. Therefore, re-analysis may produce in some samples slightly more failed targets (<1%) than previously but never different alleles.

Highlighting comparison table QC fields (e.g., average coverage) with traffic light colors if specific thresholds are not succeeded.

Export Sample Contig and SPEC Files dialog revised and possibility to filter epidemiological metadata.

Task Entry Overview table for resistance/virulence/geno-serotyping Task Templates (e.g., for VFDB) have a colored threshold legend, version information, and citation(s). Therefore, Task Entry Overview tables for cgMLST and MLST Task Templates are now also stated with version information and citation(s).

If the remove option is selected in Handling of Missing Values dialog (e.g., when creating MST), the columns are now removed from the comparison table instead of removing them from the distance calculation only.

Option to create Sample Bookmarks from samples selected in comparison table by right-click menu.

Layout of distance values in MST was revised to prevent overlap with connecting lines.

Clickable links in procedure tab of Sample Overview for Run ID and FastQC detailed results.

Clickable links in results tab of Sample Overview for AMRFinderPlus, VFDB, etc. detailed results.

Option to store NCBI Genome downloads as smaller fasta.gz files and to use in addition genus and species information in the file names.

Organism quick search list in NCBI Genome Browser is no longer sorted alphabetically but by the frequency of entries per species.

Support in NCBI Genome browser for download of 6 letter accession number draft genomes.

SRA download function for FASTQ files now uses NCBI SRA toolkit download as fallback strategy if the faster direct download fails.

Support for IBM Aspera as an alternative to FTP for uploading of FASTQs to EBI ENA. Requires the separate installation of the Aspera client by the user. Aspera uses an encrypted connection. Thereby, institutional proxy issues with FTP may be resolved (beta-version!).

Additional database fields 'Source Subtype' and 'ECDC Case ID' were added to Default Bacteria database scheme for support of TESSy XML-file export.

Overview of citations and licenses for other programs, dynamically loaded libraries, and databases and services was revised and moved into the menu entry 'Help | Citations and Licenses'.

Per base read coverage table for target in info button of contig view (when option store read data is enabled).

Option to give a reason that is stored in the task template comment when removing targets from a task template.

Option to clear immediately an old backup warning message in the system administration panel.

Default file format for export of comparison tables was changed from XLS to XLSX Excel format to support up to 16.384 columns. The export of metadata now also supports XLSX.

Default setting in new pipeline scripts for submission of sample data was changed from Store only new CT founders on cgMLST.org to Store only new CT founders as anonymized samples on cgMLST.org.