When calculating phylogenetic trees (NJ or UPGMA) in a comparison table using the MLST+ data the number of same/different alleles is used for distance calculation. The SNP distance between two different alleles is not used as this would overrate cross-over events <ref>However, it is possible to export the concatenated allele sequences for the Samples which allow to build a tree from the sequences</ref>.

We're using Nei's DA distance <ref>Nei, M., F. Tajima, & Y. Tateno (1983) Accuracy of estimated phylogenetic trees from molecular data. II. Gene frequency data. J. Mol. Evol. 19:153-170.
Formula: <math>1-\frac{1}{r}\sum_{j}^{r}\sum_{j}^{m_j}\sqrt{x_{ij}y_{ij}}</math></ref>. This distance can be used to compare the alleles for two populations. If only two Samples are compared (e.g. for building a distance-matrix) the original formula simplifies to 1/r * number of different loci with r=total number of loci.

So for your example if the number of loci is 230, and the number of same alleles between two samples are 228 the distance is 1/230 * (230-228) = 2/230 = 0,00869565, a low number meaning almost identical.

Note that there is an issue how to handle missing data. Current sequencing and assembling technology might result in missing data (no alleles found) for some of the loci. As default our software ignores the loci with missing values for one of the two Samples when building a distance matrix <ref>See the available Video Tutorial on handling of missing data</ref>.



<references/>