Multi Species Analysis - Ridom Typer Documentation

Multi Species Analysis in different Projects

To process multiple species within one pipeline into different projects, the pipeline script must be able to detect the correct project for each sample. This can be done by defining the project acronym in FASTQ or FASTA file names, or by determining the genus and species by Mash identification.

To create a comparison table for samples from different species specific projects, open the function Tools | Comparison Table and choose Select Samples from Database. Then select all projects that should be added, press the Search button, and confirm with Choose All Found. The section Choose Task Template Fields will show all task templates that are used by at least one of the projects. By default no task templates are selected, but cgMLST fields Perc. Good Target and Complex Type, and the MLST field ST will be added to the comparison table even if the scheme is not selected. For allele type fields and for other task templates, the task templates must be selected in the list.

When the comparison table is created, it will contain multiple fields for Perc. Good Target, Complex Type, and ST, usually one per task template. To collapse those task specific fields into one, the function Columns | Merge Species/Task Specific Columns can be used. It should be noted that, for example, the STs with the same number may belong to different species and MLST schemes.

Multi Species Analysis in same Project

If no cgMLST is required, multiple species can be organized in a single project, using general task templates as, e.g., NCBI AMRFinderPlus.

When assembling such different species in a pipeline, the pipeline script must determine the expected genome size automatically. This can be done, by enabling the option Find Expected Genome Size automatically (Mash Distance) in the project settings of the pipeline script. The genome size is determined then by a close matching reference genome.