Overview

Screenshot of the Minimum Spanning Tree Window

A Minimum Spanning Tree (MST) can be calculated for the Samples of a Comparison Table. The currently selected columns for distance calculation are used to create the MST. Multiple Samples can be represented by a single node based on their genotypes. The links between the nodes are based on the distance of the genotypes. The MST is calculated using a modified version of Kruskal's algorithm (Kruskal 1956). See Francisco et al. 2009 for a description of the MST algorithm. The layout is generated using a force-based algorithm.

All Samples in a node can be selected by clicking a node and the tree nodes can be dragged with the mouse. The context menu can be opened by clicking a node with the right mouse button. Due to the used algorithm lines may intersect with nodes or other lines. This can be corrected manually by dragging the tree nodes to rearrange them. The automatic graph relaxing (button MSTRelaxer.png) can apply the force-based algorithm to update the layout continuously when nodes are moved. The context menu of node can be used to lock the node to the current position.

Groups can be represented by the color of the nodes and the labels. If the Samples in the node have different groups, the size of the circle arcs represents the number of strains.

One linked MST window can be displayed at a time, this window changes when the data or the columns for distance calculation are changed. When the MST windows are disconnected (menu MST | Disconnect MST from Table) from the comparison table, an unlimited number of MST windows can be displayed and data changes will not update the MST in these windows.

Distance Calculation

The MST is calculated by using the Sample data for the columns that are marked for distance calculation (green header background).

Empty values or values that start with a ? are treated as missing values. Depending on the settings for Missing Values in MST these missing values are either treated as an own category or ignored during pairwise comparison. The Treatment of Missing Data command can be used to change the settings.

For a lower number of columns for distance calculation, (e.g. for MLVA data or MLST data), the missing values are an own category option is recommended. For a larger number of columns (e.g. cgMLST with thousands of targets) the pairwise ignore missing values option is recommended.

Dialog-warning.png
Note that the option pairwise ignore missing values may result in problems in the MST when a Sample contains many missing values. In this case, many or all of the MST nodes might be merged. It is recommended to remove Samples that have missing values in more than 10% of the columns for distance calculation before calculating an MST.

MST Clusters

Example of a MST Cluster with 14 Samples, highlighted by gray background color. The cluster label shows the name of the cluster, 'MST Cluster 2'. The cluster distance threshold is by default 10 alleles difference, i.e., clustering threshold (L. monocytogenes).

MST Clusters can be used to cluster Samples in the MST by a user specified (allelic) distance threshold. Every Sample in a MST cluster has a distance equal or below this threshold to at least one other member of the cluster. However, members at opposite ends of a cluster may have a distance to each other that is higher than the threshold (single-linkage clustering).

For comparison tables with cgMLST task templates the clustering distance threshold is used as default for the MST cluster distance threshold. For public cgMLST schemes this is equal to the CT distance.

The settings for the MST clusters can be modified in the first tab of the MST Options.

Menu

MST

  • Button16-Snapshot.png Save Comparison Table Snapshot including MST
  • Export.gif Export MST: Allows to export the MST in various formats:
    • EMF: Exports a vector graphic image that can be easily inserted into PowerPoint or Word documents and can be resized without loss of quality.
    • SVG: Exports a vector graphic image that can be resized without loss of quality. The SVG-format is well suited for creating high-resolution graphics. See Image File Formats for more information.
    • PNG: Exports a raster-image in the currently displayed resolution.
On the right side of the output file chooser dialog it can be defined if the legend should be added to the exported image or not, or if the legend text should be exported into an extra image file.
  • Button16-Copy.gif Copy MST to Clipboard: Copies the MST to the clipboard as raster-image in PNG format.
  • Button16-Copy.gif Copy Legend Groups to Clipboard: Copies the color groups of the legend to the clipboard as raster-image in PNG format.
  • Button16-Copy.gif Copy Legend Comments to Clipboard: Copies the comment lines of the legend to the clipboard as plain text.
  • MSTRelaxer.png Enable/Disable Automatic Graph Relaxing: The MST will slowly change its form until the force-based layout is optimized. This is useful when single nodes are moved by pressing and dragging the mouse.
  • Information.gif MST Statistic: Displays statistic information about the MST nodes and clusters, and shows an exportable table of the MST cluster members.
  • Treatment of Missing Data: Allows to choose if missing data should be handled as own category of if columns that contain missing values for at least one of the two genotypes are ignored during pairwise distance calculation for the MST. See Missing Values in MST.
  • Button16-Chain.gif Disconnect MST from Table: Changes in the table will not update the MST any more. Opening a new MST window is possible after a MST was disconnected. This option can be used to compare MSTs for different data or settings. Note that the labels cannot be changed anymore after the MST was disconnected.

View

Show Connection Lines dialog
  • Center.png Center MST: Fits the complete MST in the window.
  • Options.png MST Options: Opens the MST Options window that allows to customize how the MST is displayed and clusters are defined.
  • Chooselabel.png Choose Column for Label: Allows to select which column should be used to label the MST nodes. The View Options can be used to abbreviate the label if it is too long.
  • Show Connection Lines: Allows to add additional connection lines to the tree. This turns the tree into a graph. This function can be used to check if the tree is reliable.
    Two modes are available. Connection lines can be added
    • between all tree nodes up to a selected distance or
    • between selected tree nodes. If only one node is selected, connection lines to all other nodes are added. If more than one node is selected, pairwise connection lines between selected nodes will be added.
  • Reset Default MST Options: Sets the MST options to default values.
  • ExportViewOptions.gif Export MST Options: Allows to export the MST options to a file.
  • ImportViewOptions.gif Import MST Options: Allows to import the MST options from a file.
  • Name Clusters by Columns: Allows to name MST clusters by the value in a column. The clusters are named after the value of the first Sample in the cluster.
  • Create Groups for Clusters: Replaces existing groups with a new group for each MST cluster.
  • Sort by Selected Nodes: If checked, the nodes that are selected in the MST window are sorted to the top of the comparison table.

Panels

  • Show ... Panel: Allows to toggle the visibility of the following panels: Cluster Tree panel, Group panel, Zoom panel.

Context menu

Most of the functions of the context menu are only available if a node is selected or a node was clicked.

  • Select Cluster: Only available if a node in a cluster was clicked. Selects all nodes in the cluster.
  • Set Cluster Color: Only available if a node in a cluster was clicked. Allows to change the background color of the cluster.
  • Set Cluster Name: Only available if a node in a cluster was clicked. Allows to change the name of the cluster.
  • Add to Exclude List: Only available if at least one node is selected. Moves the selected nodes to the Comparison Table exclude list.
  • Set Group: Only available if at least one node is selected. Allows to change the group of all Samples in the selected nodes.
  • Lock Position: Only available if a node or a cluster label was clicked. The node cannot be moved anymore by the layout algorithm, but it still can be moved by dragging with the mouse.

References

  • Kruskal JB, On the Shortest Spanning Subtree of a Graph and the Traveling Salesman Problem. Proc of the Am Math Soc, 1956 7:48–50
  • Francisco AP, Bugalho M, Ramirez M, Carriço JA, Global optimal eBURST analysis of multilocus typing data using a graphic matroid approach, BMC Bioinform 2009, 10:152 (PubMed)