Overview

The procedure statistics are automatically calculated by SeqSphere+. They are shown below the procedure details in the Procedure tab of the Sample Overview and they can be exported like epidemiological field data. The read and assembly procedure statistics can also be exported to and imported from SPEC Files.

Contamination Check (Mash Screen)

Field Description Submission
Top Species Match1) Best matching species in Mash Screen result. If the best matching species is different from the species expected by the cgMLST task template that was used for processing, the field is highlighted yellow as warning
Top Species Match Identity1) Identity score for the best matching species
Top Species Match Shared-Hashes1) Amount of shared-hashes for the best matching species
Contamination Check Result1) Message with the result of the contamination check by using Mash Screen. If a potential contamination was detected, the field is highlighted yellow as warning
Potential Contaminating Species2) If found, the second best matching species above thresholds in Mash Screen result
Potential Contaminating Species Identity2) Identity score for the second best matching species above thresholds
Potential Contaminating Species Shared-Hashes2) Amount of shared-hashes for the second best matching species above thresholds

1) Only available if processed in a SeqSphere+ pipeline with enabled Contamination Check (Mash Screen).
2) Only available if processed in a SeqSphere+ pipeline with enabled Contamination Check (Mash Screen) and a potential contamination was found.

Read Statistics

Field Description Submission
FastQC Per Base Sequence Quality (Forward Reads)3) Base quality check result from FASTQ Quality Control (FastQC) processing for forward reads; if the check has warnings/failed, the field is highlighted yellow or red, respectively
FastQC Per Base Sequence Quality (Reverse Reads)3) Base quality check result from FASTQ Quality Control (FastQC) processing for reverse reads; if the check has failed, the field is highlighted red; warnings are ignored here for the reverse reads
FastQC Adapter Content3) Adapter content check result from FASTQ Quality Control (FastQC) processing; if the check has warnings/failed, the field is highlighted yellow or red, respectively
Avg. Coverage (Unassembled)3) Estimated based on the genome size of the seed genome and unprocessed reads
Avg. Coverage (Processed, Unassembled)3) Estimated based on the genome size of the seed genome and processed (trimming and/or downsampling) reads
Avg. Read Length (Unassembled)3) Average read length for unassambled reads
Avg. Read Length (Processed, Unassembled)3) Average read length for processed (trimming and/or downsampling) and unassambled reads
Read Count (Unassembled)3) Number of reads for unassambled reads
Read Count (Processed, Unassembled)3) Number of reads for processed (trimming and/or downsampling) and unassambled reads
Read Base Count (Unassembled)3) Sum of all read bases for unassambled reads
Read Base Count (Processed, Unassembled)3) Sum of all read bases for processed (trimming and/or downsampling) and unassambled reads

3) Only available if assembled with a SeqSphere+ pipeline.

Assembly Statistics

Field Description Submission
Contig Count (Assembled)4) Number of contigs in the assembly
N50 (Assembled)4) N50 calculated for the assembly
GC-Content (Assembled)4) GC-Content of the consensus; if the deviation to the GC-content of the seed genome is higher than 5%, the field is highlighted yellow as warning
Read Count (Assembled)5) Number of reads used in the assembly
Read Fwd Count (Assembled)5) Number of forward reads used in the assembly
Read Rev Count (Assembled)5) Number of reverse reads used in the assembly
Assembly Base Count Number of bases in all contigs of the assembly
Approximated Genome Size (Mbases) Number of bases in all contigs of the assembly in Mbases; if the deviation to the Expected Genome Size is higher than 25%, the field is highlighted yellow as warning
Perc. Covered Genome6) Percentage of non-ambiguity bases relative to 'Expected Genome Size'; This value is highlighted for quality control: ≥95%: green, <95% yellow, <90% red
Consensus Bases Below Coverage Threshold (Ns)6) Number of consensus bases that were called as N because they are below the defined coverage threshold
Consensus Bases with Non-N Ambiguity6) Number of consensus bases with ambiguity symbols except N
Genome wide variants below 70% frequency in reads6) Number of variants that have a frequency in the aligned reads of below 70%; This value is highlighted for quality control: ≤4: green, ≤10 yellow, >10 red
Variants (genome wide)6) Genome wide variants to reference with at least 70% frequency in reads
Max Contig Length (Assembled)4) Maximum length of a contig in the assembly
Min Contig Length (Assembled)4) Minimum length of a contig in the assembly
Mean Contig Length (Assembled)4) Mean length of a contig in the assembly
Avg. Coverage (Assembled)7) Estimated based on the consensus base count and assembled read base count, or imported from GenBank entry; This value is highlighted for quality control.

For Illumina and bacteria, the default thresholds are ≥75: green, <75: yellow, <30: red; If the Library Construction Method is set to Illumina DNA Prep, the thresholds are ≥28: green, <28: yellow, <20: red; For virus, the thresholds are ≥1000: green, <1000: yellow, <400: red;
For PacBio, the thresholds are ≥40: green, <40: yellow, <20: red;
For Oxford Nanopore, the thresholds are ≥50: green, <50: yellow, <30: red;

Genome Status Filled if the sequence data was imported from NCBI

4) Includes all contigs, i.e., also contigs that are smaller than 200 bases.
5) Only available if imported from ACE/BAM or assembled with a SeqSphere+ pipeline.
6) Only used for virus projects.
7) Only available if imported from ACE/BAM, available in GenBank entry, or assembled with a SeqSphere+ pipeline using Velvet, SPAdes with remapping, or SKESA with remapping or defined read length.

cgMLST Statistics

Field Description Submission
Procedure cgMLST Perc. Good Targets8) Number of cgMLST targets that passed the initial target QC procedure; This value is highlighted for quality control: ≥95%: green, ≥90%: yellow, <90%: red
Procedure cgMLST Perc. Warning Targets8) Number of cgMLST targets that passed the initial target QC procedure with warnings

8) The stated values are the result of the initial target QC procedure that was performed with the sequence data. If the sample is reprocessed they are not updated. If a project contains multiple cgMLST task template, only the values for the first one are shown.