Tutorial for Genome Sequencing Data MRSA

Overview

This tutorial describes how to use the Ridom SeqSphere+ software to analyze next-gen sequencing date with classic MLST and with an extended MLST (MLST+) with more than 1000 targets.

Furthermore, it is explained how to use a predefined MLST/MLST+ schema for automated sequence analysis. Staphylococcus aureus is used exemplarily for this demonstration. However, by reading this tutorial you should be able to define your own projects for other species.

This tutorial requires the 64 bit version of SeqSphere+.

Preliminaries

Step 1: If not done, download the SeqSphere+ client and install it on your computer.

Step 2: Download the example data archive SeqSphere_Tutorial_NGS_MRSA.zip (~300 MB), and extract the zip-file on your computer.

Step 3: Feel free to try this tutorial on the SeqSphere+ DEMO server: When starting SeqSphere+ for the first time, select Connect to SeqSphere+ DEMO Server.

Defining Project and Tasks

Creating New Project

Step 1: First launch the SeqSphere+ client and connect to your SeqSphere+ Server.

Step 2: Then create a new project for use with your sample data with the menu: File->New->Project

Step 3: Enter a name for your project (e.g., MRSA)

Adding MLST+ Task Template

Step 1: Each project within SeqSphere+ needs to have at least one Task Template associated. Press the button Add to add a new Task Template.

Step 2: The Task Template managing dialog opens. Click on the button Create New.

Step 3: A new dialog window opens. Click on the label Create Task Template for Next-Gen Sequencing Data.

Step 4: Then choose the type of Task Template to use: Create MLST+ Task Template linked with Nomenclature Server.

Step 5: Then choose as organism Staphylococcus aureus and as MLST+ Schema N315 demo. Click Next to download the Task Template.

Step 6: Finally the name and access rights for this Task Template can be set. Click Finish to confirm the dialog.

Step 7: The new Task Template is selected by default. Click OK to add it to your Project.

Adding Classic MLST Task Template

The next step is to add a second Task Template for a classic MLST.

Step 1: Press again the Add button to the right of the Task Templates again, and press the Create New button to create a new Task Template.

Step 2: Choose again Create Task Template for Next-Gen Sequencing Data.

Step 3: Now choose Create Task Template by Predefined MLST Schema.

Step 4: Choose the organism Staphylococcus aureus and the data will be downloaded from the public MLST server. Then click Next.

Step 5: Leave the default Target Parameters. Click Next.

Step 6: Check the name of your new Task Template, and confirm with Finish.

Step 7: Press OK to save the new Task Template and add it to your project.

Step 8: Press Save & Close to store your Project.

Working with Genome Sequencing Data

Importing Targets from Genome Sequencing Data

Step 1: Choose from the menu File > Create Samples from Assembled Genomes

Step 2: Choose the Project you just created.

Step 3: Press Add Task button to the right, and select all Task Templates of the project.

Step 4: Now use the button Add from File and choose two or more of the ace-files from the tutorial data folder. Those files are de-novo assemblies (assembled with MIRA) of NGS data.

Step 5: Then press the button Add from NCBI GenBank and add the reference genome NC_002745 to the input data.

Step 6: The Advanced Settings button can be used to define the BLAST settings, the matching thresholds, or to enable the batch mode. The batch mode can be used to invoke an import of several genomes without any confirmation dialogs. Because the samples are processed successively, the batch mode requires much less memory than the interactive mode. For this tutorial leave the settings to defaults.

Step 7: Confirm with OK.

Step 8: Ridom SeqSphere+ now loads all input sequence and finds (by using built-in BLAST) each of the target reference sequences that are defined in the Task Template.

Step 9: The scanning result for each input sequence is shown in table format, listing all the targets with their percent identity, alignment, start and stop positions and other relevant data points. The first column of table marks the targets that should be imported into SeqSphere+. By default, only the targets that fulfill the specified identity and alignment thresholds (e.g., 90% identity to ref.-seq., and 99% aligned of ref.-seq.) will be added to the new Sample entry. The targets that don't have a unique match that fulfills the thresholds are colored red. The thresholds are normally taken from the Task Template, but they also can be changed in this step.

Step 10: The button above the table can be used to show all BLAST hits for a target, and to show the alignment between the ref.-seq. and the assembled contig or genome region.

Step 11: Press the button Create/Extend Samples to import the targets that are checked in the first column.

Editing Samples of Imported Genome Sequencing Data

Step 1: After the import is completed, the navigation tree shows all new Samples. Each Sample node in the navigation has two sub nodes: The MLST+ task and the classic MLST task. Below the task nodes there are the target nodes. Each target node represents one sequence (often a gene) extracted from the input data (genomes or wgs contigs). The targets can have different states:
- Grey Targets were not extracted (because the match did not reached the thresholds in the previous step)
- Green Targets were extracted and fulfill all requirements that are defined in the Task Template Analysis Parameters.
- Yellow Targets were extracted, but fail at least in one of the requirements that are defined in those parameters. For example, they may have frame shifts and incorrect lengths compared to the published N315 reference sequence. Those targets must be inspected further.

Step 2: Together with the new Samples the Position Navigator is opened. This window shows the most interesting positions in the new sequences.

Step 3: Click on the Homopolymer related InDels. This table shows all found Homopolymer related InDels for the three new Samples. Click on the Auto-Correct InDels button to start the auto-correction. The parameters can be left to default. The auto-correction can only be done for sequences imported from an alignment file (ace), because the coverage and reads are needed to produce reliable results to determine corrections.

Step 4: Double click on a row in the table to jump to the according position in the contig. Errors that cannot be corrected automatically must be manually edited in the contig. Changes can be made to the sequence itself here by right-clicking on the sequence and choosing form the options in the context menu, Add base, Delete base, etc. For example, if many stop codons exist in contig, most likely there is a frame shift. This can be found by scrolling through the list of analysis problem locations.

Step 5: Finally click on File->Save All to store the new Samples in the database of the server.

Typing with MLST and MLST+

Step 1: For each imported Sample there is now on node in the navigation tree on the left.

Step 2: Double click on the sub-node Staphylococcus aureus MLST below such a Sample node to show the MLST results.

Step 3: Double click on the sub-node N315 demo to show the MLST+ results for the selected demonstration schema.

Step 4: When multiple samples were imported, a Comparison Table can be created. This will show the allele types of the samples in table format. Any differences between samples can be determined. The Comparison Table offers tools for distance calculation and phylogenetic trees, and also can be exported into Excel spreadsheet format.

Step 5: From the menu Tools->Comparison Table and press New Definition.

Step 6: Enter a name for your Comparison Table definition.

Step 7: Choose your new project.

Step 8: In the box Typing Results select both checkboxes, for MSLT and MLST+.

Step 9: Confirm with OK two times.

Step 10: A table with all allele types of the three samples is shown.

Step 11: Press for SeqSphere+ to calculate the distances between the samples and draw a minimum spanning tree for them. If the table contains missing data (targets that have no allele types assigned yet), the columns can be automatically removed from distance calculation.

Contents