Control and Edit Laboratory and Assembly Procedure Details

Overview

The Sequence Specification is used to document the laboratory procedure, the assembly procedure and the scanning procedure. They can be predefined and managed using the menu item Options | Sequence Specifications

If a new Sample created from WGS data, they can be assigned the the data and stored together with the Sample. The stored specifications can be viewed and modified in the Procedure tab of the Sample Overview. They can also be exported or added to a Comparison Table.

Laboratory procedure details

The laboratory procedure details must be specified manually for an input file, or must be manually defined in the pipeline script.

Field Description ENA submission
Nucleic acid extraction Link to a literature reference, electronic resource or a standard operating procedure (SOP)
Library source Library source used (genomic includes PCR products from genomic DNA) Required
Library strategy Library strategy used Required
Library selection Library selection used Required
Library construction method Library construction method used
Library amplification method Library amplification method used
Sequencing protocol Sequencing protocol used Required
Library insert size Library insert size in base pairs (excluding adaptors and/or primers) Required
Sequencing length Number of bases of insert sequenced Required
Sequencing vendor Sequencing platform producer Required
Sequencing platform Sequencing platform used Required

Assembly procedure details

The laboratory procedure details can be specified manually for an input file, or can be manually defined in the pipeline script. If the pipeline performs an assembling (de novo or mapping), any existing values will be overwritten by the actually used assembly details.

Field Description
Assembly pre-processing Coverage downsampeled and/or trimmed by quality (window and QV)
Assembly type General assembly approach (de-novo or mapping)
Mapping reference genome NCBI accession number of reference genome against which raw read data were mapped
Assembler Software used for assembly
Assembler version Version of software used for assembly
Assembler parameters Assembler parameters used for assembly
Sequencing comment Additional information regarding laboratory and assembly meta-data

Assembly statistics

Some additional statistics for the assembled sequence are calculated by SeqSphere+. Those fields are also shown in the the Sequence Specification table of the Procedure tab and they can be exported like normal fields. The values are automatically calculated for and therefore those fields are not shown in the Sequence Specification managing dialog.

Field Description
Contig Count (Assembled) Number of contigs in the assembly1)
Read Count (Assembled)2) Number of reads used in the assembly
Read Fwd Count (Assembled)2) Number of forward reads used in the assembly
Read Rev Count (Assembled)2) Number of reverse reads used in the assembly
Base Count (Assembled) Number of bases in the assembly
Max Contig Length (Assembled) Maximum length of a contig in the assembly1)
Min Contig Length (Assembled) Minimum length of a contig in the assembly1)
Mean Contig Length (Assembled) Mean length of a contig in the assembly1)
N50 N50 calculated for the assembly1)
Avg. Coverage (Unassembled)3) Estimated based on the genome size of the reference genome and unprocessed reads
Avg. Coverage (Processed, Unassembled)3) Estimated based on the genome size of the reference genome and processed (trimming and/or downsampling) reads
Avg. Coverage (Assembled)4) Estimated based on the consensus base count and assembled read base count, or imported from GenBank entry
Avg. Read Length (Unassembled)3) Average read length for unassambled reads
Avg. Read Length (Processed, Unassembled)3) Average read length for processed (trimming and/or downsampling) and unassambled reads
Read Count (Unassembled)3) Number of reads for unassambled reads
Read Count (Processed, Unassembled)3) Number of reads for processed (trimming and/or downsampling) and unassambled reads
Read Base Count (Unassembled)3) Sum of all read bases for unassambled reads
Read Base Count (Processed, Unassembled)3) Sum of all read bases for processed (trimming and/or downsampling) and unassambled reads
Downsampled to Coverage3) Expected genome coverage that was used to calculate the ratio for random downsampling
Expected Genome Size for Downsampling3) Expected genome size that was used to calculate the ratio for random downsampling
Quality Trimming3) Parameters that were used for quality trimming of reads

1) includes all contigs, i.e., also contigs that are smaller than 200 bases
2) only available if imported from ACE/BAM or assembled with a SeqSphere+ pipeline
3) only available if assembled with a SeqSphere+ pipeline
4) only available if imported from ACE/BAM, assembled with a SeqSphere+ pipeline or available in GenBank entry

Scanning procedure details

Example of Scanning procedure details

The scanning procedure details are stored as one text field. In contrast to the other two sections, these details are not editable and are not associated to an assembly file. They are always automatically filled by SeqSphere+ and associated to a Sample.