ContentsIntroductionThe authors from NCBI of the new SKESA de novo assembler claim in their publication (citation) that the assembler
In this evaluation we aim to confirm their claims with a specific focus on allele calling efficiency. MethodsThree strains with finished genomes were re-sequenced on an Illumina MiSeq machine. The strains cover the whole range of G/C content, i.e., low Staphylococcus aureus strain COL (NC_002951; 2.8 MBases genome), medium Escherichia coli strain Sakai (NC_009089; 5.5 MBases genome), and high Pseudomonas aeruginosa strain PAO1 (NC_002516; 6.3 MBases genome). 250bp Nextera XT paired-end (PE) libraries were produced for all 3 strains. In addition were for S. aureus 150bp and 300bp PE libraries constructed. Finally, two 250bp PE libraries of mixtures with different concentrations of S. aureus and Enterococcus faecium strain ATCC BAA-472 (NC_017960.1; 3 MBases genome) were produced. Assembly of the produced data was performed with the three different de novo assemblers that are available in SeqSphere+: SKESA (version 2.3), SPAdes (version 3.11) and Velvet (version 1.1). Before assembling the data were downsampled to different estimated coverages. After assembly an allele calling was done with SeqSphere+ using cgMLST reference(seed)-only schemes based on the NCBI GenBank entry of the same strain. The following parameters were compared for different coverages and assemblers:
ResultsAssembler Time / Allele Calling Efficiency / N50 from Pure CultureStaphylococcus aureus 250bp PE SeqSphere+ used a cgMLST reference-only scheme with 2,486 targets. SKESA and SPAdes were run on Linux whereas Velvet was run on Windows.
SeqSphere+ used a cgMLST reference-only scheme with 4,225 targets. SKESA and SPAdes were run on Linux whereas Velvet was run on Windows.
SeqSphere+ used a cgMLST reference-only scheme with 5,267 targets. SKESA and SPAdes were run on Linux whereas Velvet was run on Windows. Percentage of Good cgMLST Targets / N50 from Mixed CultureDNA of S. aureus strain COL (NC_002951; 2.8 MBases genome) and Enterococcus faecium strain ATCC BAA-472 (NC_017960.1; 3 MBases genome) were mixed with 60:40 and 90:10 ratios, respectively. Re-sequencing of 250bp PE Nextera XT libraries was done on a MiSeq. Resulting reads were downsampled to different estimated coverages relative to the COL genome size and processed with SeqSphere+ using a Staphylococcus aureus cgMLST reference-only scheme with 2,486 targets. SKESA and SPAdes were run on Linux, whereas Velvet was run on Windows. For comparison also the pure culture data of the COL strain are shown in the graphs. SKESA
SummaryThose claims of the authors of the SKESA publication that were checked could be verified. SKESA produces indeed very fast high quality de novo assemblies that are very well suited for cgMLST allele calling. |