Introduction

The S. aureus spa-typing task template can be used for WGS and for Sanger sequencing data to assign a type based on the Staphylococcus protein A gene (spa). The task template can be downloaded from the Task Template Sphere.

If WGS data is used as input, primer sequences are used to find the spa region in the genome scanning step that is performed before the spa-typing starts (forward primer: TAAAGACGATCCTTCGGTGAGC, reverse primer: CAGCAGTAGTGCCGTTTGCTT).

The spa-typing searches for the 5' and 3' signatures (RCAMCAAAA, TAYATGTCGT) and trims the sequence to them. Then the sequence is searched for known repeats, and for potentially new repeats that are matching a specific pattern. The result is QC controlled by determining a reliablity rate for the spa-typing (see table below).

To guarantee a standardized nomenclature, the spa-types and repeats are named by a global nomenclature that is controlled by the Ridom SpaServer. They are named with an ID which is either a leading "t" (spa-type) or "r" (repeat) followed by a unique number.

Spa-typing result in Task Entry Overview

The Task Entry Overview for a spa-typing task shows a result message colored by the reliability, the four result fields, and potentially links to unreliable positions in the sequence that need to be checked.

If an unknown spa-type is found for Sanger sequencing data, and the reliability is "good" or "excellent" (reliability rate >= 100), the new spa-type can be submitted to the SpaServer by using the submission button in the Task Entry overview. New spa-types that were found in WGS data cannot be submitted.

The following meta-data fields are submitted together with the sequence data and the spa-typing result:

Submitter User ID (database ID and login name) ^*
Submitter Email Address ^*
Submitter Organisation ^*
Submitter City ^*
Submitter Country ^*
Sample ID ^*
Collection Date ^*
Country of Isolation ^*
Origin (Source Type and Source Subtype)
MRSA/MSSA

^* mandatory field

Result Fields

The S. aureus spa-typing provides four result fields:

SpaType (shown in result table and used in comparison tables for distance calculation)
Repeats (shown in result table)
Reliabilty (shown in result table)
Reliabilty Rate

Sample result table containing the spa-typing result fields

When the SpaType field is used in a Comparison Table, distances for this field are calculated using the BURP (Based Upon Repeat Patterns) alignment algorithm. A detailed description of the algorithm can be found in Mellmann A, Weniger T, Berssenbrügge C, Rothgänger J, Sammeth M, Stoye J, and Harmsen D. Based Upon Repeat Pattern (BURP): an algorithm to characterize the long-term evolution of Staphylococcus aureus populations based on spa polymorphisms. BMC Microbiol. 2007, 7: 98 [PubMed 17967176], and in Sammeth M, and Stoye J. Comparing tandem repeats with duplications and excisions of variable degree. IEEE/ACM Trans Comput Biol Bioinform. , 3: 395-407 [PubMed 17085848]

Reliability Rating

The reliability of a spa-typing result is specified by a numeric value between 0 and 120. Submitting a strain for a new spa-type requires a reliability of 100 or better. Therefore, new spa-types are not submittable with WGS data.

Reliability Rating		Input Data	Signature Search	Repeat Search
120	excellent	as below	as below	as below, but less than 5 editing steps in repeats
110	good	consensus of two Sanger sequencing chromatograms files	as below	as below, but no low-quality bases (consensus base quality < 20)
100	good	at least one Sanger sequencing chromatogram file	as below	as below, but no editing steps in new repeats
90	sufficient	at least one Sanger sequencing chromatogram file	as below	as below
60	sufficient	consensus of two Sanger sequencing FASTA files	as below	as below
50	sufficient	as below	both signatures found on correct positions	as below
40	sufficient	as below	both signatures found, but position shifted less than 20nt from expected position	as below, but no high quality mismatches (chromatogram base quality≥15)
30	poor (not reliable)	as below	both signatures found, but position shifted	as below
20	poor (not reliable)	as below	at least one signature not found, but sequence to short	as below
10	poor (not reliable)	as below	at least one signature not found	as below, but not more than 5 low-quality positions (consensus base quality < 20)
5	poor (not reliable)	as below	n/a	continuous repeat succession without gaps
0	poor (not analyzable)	any FASTA sequence	n/a	no repeats or noncontinuous