ContentsOverviewThe Core Genome MLST (cgMLST) Target Definer extracts genes from one seed genome and uses BLAST to compare these genes against multiple penetration query genome sequences on DNA level. Two sets of genes are defined, the cgMLST targets and Accessory targets. Two tutorials are available for the cgMLST Target Definer: Seed GenomeAll gene annotations that also have a CDS annotation at the same location are used as initial seed genes. Therefore, the number of seed genes may slightly differ to the number in NCBI Genome Browser (e.g., tRNA will not be used in the cgMLST target definer). cgMLST TargetsWhen the default filter settings are used, the cgMLST targets contain genes from the seed genome that
These targets are usually well suited for cgMLST typing. Accessory TargetsTargets that are not included in the cgMLST targets because
are added to the Accessory targets (with the default settings). These targets can be used to gain additional discriminatory power if typing using the cgMLST targets alone is not discriminatory enough. SettingsThe cgMLST Target Definer panel allows to choose a seed genome and multiple penetration query genomes. Filters and analyzers can be selected using the corresponding tabs. Input filesA seed genome can be defined using a GenBank or gbff file or by download from NCBI Genomes using an accession number. Penetration query genomes can either by defined by a GenBank or FASTA-file or by download using accession numbers. Multiple accession numbers for penetration query genomes can be specified, separated by comma. GenBank inputWhen reading GenBank/gbff files or downloading data from GenBank using an accession number, only the genes that have a CDS-region are used. Genes that are not continuous and genes with a codon start > 1 are skipped. The "locus_tag" is used as gene name. FASTA inputIf the FASTA file contains multiple sequences, all the bases are concatenated to one single sequence that is used as genome. Exclude filesA list of files for exclusion of genes can be specified. Genes from the seed genome are excluded if a BLAST match with more than 90% similarity and > 100 bp length is found within the specified sequences. This feature is useful to exclude sequences from plasmids. FiltersGenes are either discarded or moved to Accessory targets if they do not pass the filters. Two sets of filters exist:
Taxonomic outliersThe button Find taxonomic and quality outliers can be used
To find these outliers, all non-homologous genes from the seed genome are searched in each of the penetration query genomes using BLAST. A list reports for every penetration query genome how many of these genes were found and how many of the found seed genome genes contain stop codons. Result viewThe result view lists all genes that are found as cgMLST targets or Accessory targets and all discarded genes. The button Create Task Templates allows to create Task Templates directly from the results. The Task Template target names and sequences are imported from the seed genome.
Target Definer AlgorithmThe following pseudocode describes the algorithm: Input: seed genome, penetration query genomes, settings Depending on the reason why they were filtered out, some of the filtered out genes are added to the Accessory targets (see filter description).
|