Core genome MLST (cgMLST) schemes consist of a fixed set of conserved genome-wide genes. cgMLST schemes are usually species specific'. Occasionally for very closely related species, e.g. Mycobacterium tuberculosis complex or Brucella species, the schemes are genus specific. Occasionally a scheme is even only sub-species specific, e.g. STEC. All public and stable cgMLST schemes are curated by species experts.

There are two different types of cgMLST schemes possible, i.e. stable and ad hoc ones. Stable schemes provide a public expandable nomenclature whereas ad hoc schemes provide a local nomenclature. Defining, evaluating, and calibrating a good stable cgMLST scheme is quite laborious. However, all approved stable schemes are publicly available and downloadable from the Task Template Sphere for immediate use. In contrast, users have to quickly establish an own ad hoc scheme.

Stable and ad hoc cgMLST schemes deliver equal good genotyping results when used for analyzing outbreak(s). Of course when using an ad hoc scheme it is by definition not possible to share an allele nomenclature between laboratories. Furthermore, stable cgMLST schemes come with a predefined allele distance threshold for detecting clusters. However, users can define for ad hoc schemes their own thresholds that also will be used to trigger cluster alerts. Finally, the percentage of good cgMLST targets might not be a good quality control parameter if used with an ad hoc scheme if the scheme was not carefully enough defined or applied.

A cgMLST scheme is usually slightly less discriminatory than than a scheme done with an ‘SNP-like’ approach but better suited to do prospective analysis. However, when cgMLST and accessory genome genes of such a scheme are taken for comparative analysis then the discriminatory power is nearly as high as with the ‘SNP-like’ approach.

All experts interested that have access to a well-characterized and diverse seed strain collection of a certain species and epidemiologic well-defined outbreaks can develop a potentially stable cgMLST scheme. The general procedure is described in the stable cgMLST scheme tutorial. Currently only Ridom can make such cgMLST schemes public.

Two different approaches to define an ad hoc cgMLST scheme are possible depending whether the scheme is going to be used for analyzing a single or multiple outbreaks:

  • The single outbreak analysis approach is very similar to the procedure how SNP calling publications are usually done. Here, the researcher must first determine the genetically closest available finished (complete or chromosome) genome, e.g. by in silico MLST or kmer search, and then use this genome as seed genome without any query genomes for establishing an ad hoc cgMLST scheme. This approach delivers the highest possible discriminatory power but is not well suited to be expanded for the analysis of multiple outbreaks with different genetic background or continuous monitoring.
  • The multiple outbreak analysis approach follows in essence at least the chapter 3 of the stable cgMLST scheme tutorial, i.e. a well-characterized strain is taken as seed genome and usually multiple query genomes are used to establish a potentially stable cgMLST scheme for ad hoc usage with a local nomenclature.

The ad hoc cgMLST scheme tutorial describes the single outbreak analysis approach only.