Introduction

The S. enterica SISTR Geno-Serotyping task template is based on Salmonella In Silico Typing Resource (SISTR) that is a bioinformatics tool for rapidly performing in silico serotyping analyses on draft Salmonella genome assemblies [PubMed 26800248]. The serovar prediction module of SISTR utilizes O (somatic) and H (flagellar) antigen and/or serogroup-specific probes, which provides serovar identification for about 90% (n = 2,190) of serovars. SISTR does not identify or report serovar variants requiring biochemical or sub-speciation tests for full characterization. Furthermore, SISTR performs a cgMLST analysis using a by strict consensus defined scheme of 330 targets (cgMLST330). The completeness of cgMLST330 data is used in SISTR for assessing genome assembly quality (QC Status). Because draft genome assemblies may generate incomplete data for the antigenic query, the algorithm incorporates logic that allows for partial matching of the antigenic formula. Results with multiple possible serovars use the 'phylogenetic context', whereby the query genome is compared against a wide cross-section of genomes from different serovars and subspecies and the predominant serovar of genomes within the same cgMLST330 cluster is used to identify the most likely serovar.

When a unique serovar is identified based on antigen identification, the SISTR serovar prediction pipeline is complete. The phylogenetic context from cgMLST330 is used for serovar prediction only when it is not possible or incomplete by antigen geno-serotyping. Finally, the phylogenetic context method is also used to determine the Subspecies.

Button16 Important.png Important:

Task Entry Overview

Genotyping result table of Task Entry Overview for S. enterica SISTR Geno-Serotyping task

When a S. enterica SISTR Geno-Serotyping task template is processed the following information is stated in the task entry overview:

  • Subspecies (cgMLST330)
  • Serogroup
  • Serovar
  • Serovar (antigen)
  • Serovar (cgMLST330)
  • Antigenic Formula
  • O Type
  • H1 Type
  • H2 Type
  • Found Loci (cgMLST330)
  • Matching Genome (cgMLST330)
  • Matching Allels (cgMLST330)
  • Distance (cgMLST330)
  • QC Messages
  • QC Status


QC Status and Serovar are colored according to the result of the cgMLST330 QC:

  • Dark green: QC Status = PASS
  • Orange: QC Status = WARNING
  • Red: QC Status = FAIL

Result Fields

Sample result table containing the result fields of the S. enterica SISTR Geno-Serotyping task entry

The task entry stores the following results in searchable database fields:

  • Serogroup
  • Serovar
  • Antigenic Formula
  • O Type
  • H1 Type
  • H2 Type
  • QC Status


Those result fields are all shown in the sample result table. Only the two fields 'Serovar' and 'Antigenic Formula' are shown by default in a comparison table (with gray column header). However, if needed the other results fields can be added to a comparison table using the advanced setting.

The Subspecies (cgMLST330) information is stored in the according field in the 'epi characteristic' section of the epidemiological metadata.

Runtimes

The following table contains the measured SISTR (1.1.1) runtimes for various Salmonella finished and draft genomes on a dual core desktop with 16GB RAM (using Windows Subsystem for Linux).


NCBI Accession Number Contig Count Genome Size (Mbases) Runtime
NZ_CP007260 1 4.7 47s
MYEH00000000 50 4.9 54s
JYWA00000000 100 5.1 55s
AUQI00000000 502 4.9 1m 10s
JAALIX000000000 2147 5.3 1m 20s