Core genome MLST (cgMLST) schemes consist of a fixed set of conserved genome-wide genes. cgMLST schemes are usually species specific'. Occasionally for very closely related species, e.g. Mycobacterium tuberculosis complex or Brucella species, the schemes are genus specific. Occasionally a scheme is even only sub-species specific, e.g. STEC. All public and stable cgMLST schemes are curated by species experts.

A cgMLST analysis with previously fixed set of genome-wide genes was first applied during the 2011 German STEC O104:H4 outbreak (PLoS One. 2011, 6:e22751). We introduced in April 2013 for such an analysis the term MLST+ (Nat Biotechnol. 2013, 31:294). The alternative term cgMLST was first coined in October 2013 by Maiden et al. into discussion (Nat Biotechnol. 2013, 31:294). At that time the authors meant by cgMLST not a fixed set of loci but rather ‘shared’ loci of selected isolates under study. The community and we quickly adopted the term cgMLST for a fixed set of genome-wide genes (J Clin Microbiol. 2014, 52:2479). Therefore, we abandoned the usage of the term MLST+ in mid 2015 at all. However, the term core genome is here somewhat misleading as for pragmatic and reproducibility reasons in contrast to traditional approaches the core genome is defined on nucleotide and not protein level with in additon rather stringent search criteria. Therefore, the thus determined core genome is usually smaller than by traditional approaches.

In principal such a core genome can be defined by two different ways, i.e., either by taking a very large collection of genomes and inferring a ‚soft core genome‘, which is defined as genes found e.g. in 95% of all the analyzed genomes (BMC Genomics. 2012, 13:577) or by selecting in contrast only a limited number of genomes that must reflect the whole species genomic variability and asking for a 100% presence of genes (‚hard core genome‘). The SeqSphere+ cgMLST Target Definer follows the latter approach.