Overview

Local Single Linkage Clustering (SLC) IDs are assigned according to cgMLST profiles (using the pairwise ignore missing values approach) of samples and can be defined for a project. SLC IDs are just integers that increase with every new cluster. The Local SLC IDs can be used to detect and label clusters of samples with a similar allelic cgMLST profile automatically and to build a sort of 'nomenclature' based just on own samples only. Local SLC IDs can be defined for various projects separately and multiple hierarchical Local SLC IDs can be defined per project.

Defining cgMLST Local SLC IDs

Project Editor with button to edit settings for Local SLC IDS
Editor for Local SLC ID settings

The Local SLC IDs can be created and managed with the button in the project editor below the task templates section. The button is only enabled if a cgMLST task template was added to the project. The button icon is grey if no Local SLC IDs Definition was defined for the project yet, else it is colored blue. When pushing the Add cgMLST Local SLC ID Definition button a cgMLST Local SLC ID can be defined.

A cgMLST Local SLC ID Definition contains the following settings:

  • Name of cgMLST Local SLC ID Definition
A name that can be used as a reference for this definition. The name can be left empty if only one Local SLC ID is used for the project.
  • cgMLST Task Template for Distance
If the projects contains multiple cgMLST Task Templates the one to be used for distance calculation can be selected here.
  • Allelic Profile Distance Criteria
Defines the allelic distance threshold for Local Single Linkage Clustering. By default, the clustering threshold of the cgMLST Task Template is used. Alternatively a multiple of the clustering threshold or an absolute allele distance can be chosen.
By default all samples with more than 10% missing values will be excluded from clustering.
  • Other Settings
The database field can be selected here where the assigned Local SLC IDs should be stored. By default it is stored in the field 'Local SLC ID' (in the section 'Epi Characteristic'). If multiple Local SLC IDs are defined for a project (e.g., hierarchical), the Local SLC IDs must be stored in different fields. Only the default field 'Local SLC ID' and/or newly created fields (must be contained in the 'Epi Characteristic' or a new section and must be of type 'Textfield') can be used to store Local SLC IDs. The fields that are used for storing the Local SLC IDs are automatically added to the comparison table default fields for this project.

When pushing the buttons OK of the Edit cgMLST Local SLC ID Definitions dialog and Save & Close of the project dialog the Local SLC ID definition is stored and active. At the same time all potentially already existing Local SLC ID entries of the project are written as Predefined ID in the History log. Furthermore, a dialog will open asking if for all (potentially) existing samples of the project Local SLC IDs should be determined. The option Remove potentially existing SLC IDs and reset ID counter to 1 is selected by default in this dialog. If this option is used all potentially existing Local SLC ID field entries are deleted and Local SLC IDs are merged but without concatenating all merged IDs (only the lower ID 'survives'). If the option is deselected potentially existing Local SLC ID field entries are NOT deleted, Local SLC IDs are merged, and get a 'concatenated name' build from all involved IDs.

Doc-info.pngHint: If a user has already own Local SLC IDs (manually) defined for some or all samples of a project and wants the SLC IDs just automatically extended by the here described mechanism for new project samples and wants those predefined IDs logged in the history (see below), then the own IDs should be first imported into the SLC ID database field before storing a new cgMLST Local SLC ID Definition.

Adding and Deleting cgMLST Local SLC ID Definitions

A Local SLC ID Definition can be added and deleted at any time when in the project manger window a project is selected and the Edit cgMLST Local SLC ID Definitions button of this project is pushed. For adding a Local SLC ID Definition the Button16-Plus.gif icon must be pushed and the new Local SLC ID must be defined and stored. For deleting a Local SLC ID the corresponding tab must be selected and the Button16-Minus.gif icon must be pushed. Once the buttons OK of the Edit cgMLST Local SLC ID Definitions dialog and Save & Close of the project dialog are clicked the definition is deleted. Furthermore, a dialog will open asking if any existing ID entries in the corresponding database field should also be deleted.

cgMLST Local SLC ID Assignment

If Local SLC IDs are defined for a project, the clustering is automatically performed when a sample is added to this project via either the Pipeline Mode or the command Process Assembled Genome Data. If a cluster is found, the SLC ID of the cluster is stored in the defined sample field (by default 'Local SLC ID' in section 'Epi Characteristic').

For each sample that is added to the project all stored samples of the same project are searched that have a similar allelic profile within the defined distance threshold:

  • if no samples were found within the threshold, no Local SLC ID is assigned;
  • else, if samples were found and all of them contain no Local SLC ID yet, a new ID is established and assigned to the new sample and to all found samples within the threshold (in the History log labeled as Established new ID action);
  • else, if samples were found and they contain all the same Local SLC ID (or no ID), the SLC ID is added to the new sample and to all found samples with no ID assignment yet (in the History log labeled as Added to ID action); and
  • else, if samples were found and they contain different Local SLC IDs, a merging is performed. Thereby the SLC IDs are merged together and the merged Local SLC ID is assigned to the new sample, to all found samples within the threshold, and to all stored samples of the project that had one of the merged SLC IDs assigned (in the History log labeled as Merged IDs action). The new ID name is created by concatenating all merged IDs with the delimiter "/".

The Local SLC IDs are only assigned to samples that were found in the search and not to samples that are similar to the found ones. Therefore, when a new local SLC ID is defined for a project that contains already samples, it is recommend to perform immediately after finishing the definition a cgMLST Local SLC ID assignment for the existing samples (see below).

Button16 Important.png Important: The field that was chosen for storing the Local SLC IDs can be edited manually. Those edits are not stored in the History and due to merging, it can happen that the field value is exchanged by SeqSphere with a merged Local SLC ID.

History of cgMLST Local SLC ID Assignments

History for a Local SLC ID

The button Show History in the editor panel of the Local SLC ID definition can be used to list and export the history of all SLC ID assignments in a table. By default the table is sorted according to the time stamp of creation (newest entry on top). Selecting row(s) of the table, right-clicking, and selecting the command Show All Samples of Selected Local SLC ID(s) in Comparison Table opens a comparison table with all involved samples. The second command Delete Selected Local SLC ID(s) in All Samples with those IDs allows to empty the Local SLC ID entries of those samples (e.g., multiple isolates of the same patient). The table rows can be filtered by entering either a Local SLC ID or a Sample ID. Local SLC IDs entered/edited manually or imported after storing a new Local SLC ID definition are not logged in the history. Only SLC IDs already existing when defining and storing a new Local SLC ID mechanism are logged in the History.

Perform cgMLST Local SLC ID Assignment for Existing Samples

The button Button16-PerformSLCIDs.png in the toolbar of the project manger window allows to perform for a selected project a Local SLC ID assignment for existing samples at any time. If multiple Local SLC ID definitions are available for the project then in an upcoming dialog first the definition for which all the following should apply must be selected. The option Remove potentially existing SLC IDs and reset ID counter to 1 is selected by default. If this option is used all potentially existing history logs and Local SLC ID field entries are deleted (usually this field should be empty anyway; however a user could have entered manually or imported field values) and a single linkage clustering of all project samples is initiated with the ID counter starting to count from 1. Local SLC IDs are merged and the lower ID, i.e. earlier ID, is used as ID for the merged samples.

If the option Remove potentially existing SLC IDs and reset ID counter to 1 is deselected the potentially existing Local SLC ID field entries are not removed and instead used for ID assignment. If a new ID number must be assigned the counter starts assigning ID numbers with a value one integer higher then the highest already existing ID number. Furthermore, when doing a merging a 'concatenated name' is built from all involved IDs is applied.

Searching And Analyzing Samples With (potentially) Identical Local SLC IDs

If a sample is loaded it is possible to search (recursively) for a given threshold samples similar by cgMLST profile by clicking the icon Button16-SearchSamplesLocally.png Search Similar Samples in Database. In a default similarity search only samples are found, that are within the defined allelic threshold compared to the query sample. If the recursive option is turned on, all samples that are within the defined allelic threshold to any other found samples are returned. This search is repeated until no additional samples are found. Similar the command Tools | Search Similar Samples in Database can be elicit from the main menu to find potentially matching samples once query sample(s) are selected.

If samples are opened in a Comparison Table and sample(s) are selected, again a (recursive) similarity search for those samples can be conducted by clicking the icon Button16-SearchSamplesLocally.png Add Additional Samples to Table by Allelic Similarity or electing the corresponding Data menu command. If the Local SLC ID(s) are stored in the database field Local SLC ID it is possible via the Data menu command or icon AddSampleToTable.png Add Additional Samples to Table with Same SLC ID to load in the comparison table immediately all stored samples of the project with identical SLC ID(s). If the SLC ID(s) of interest are stored in a different database field, the AddSampleToTable.png Data | Add Additional Samples to Table by Metadata Search command or icon must be used. Next, the database field that holds the SLC IDs must be selected, the SLC ID(s) values of interest must be entered, and finally the search must be conducted.