Overview

SPEC files can be used in SeqSphere to export and import Metadata together with sequence data files. There are different purposed where they are useful:

  • Exporting/importing contig files from Samples together with metadata
The menu function File | Export Sample Contig/SPEC Files can be used to export the FASTA assembly contigs for multiple Samples. By default this also stores a separate SPEC file for each FASTA file, with the same name but extension ".spec". If exported FASTA files are later imported by SeqSphere+, the SPEC files are automatically detected and imported as procedure details and statistics. Optionally, metadata and tags can also be exported/imported using SPEC files.
  • Exporting/importing predefined procedure details
Laboratory and assembly procedure details can be predefined and managed using the menu function Options | Procedure Details. The predefined sets can be exported to SPEC files. They can also be imported from SPEC files that were exported from existing Samples in the Procedure tab.
  • Storing Metadata for Downloaded FASTQ Files
The menu function Tools | Download FASTQ from SRA can be used to download FASTQ files from NCBI SRA. The metadata of downladed runs, is automatically stored in SPEC files that are in the same directory as the FASTQ files.
  • Forwarding procedure details and statistics from an external assembling pipeline
If an external assembling pipeline outside of SeqSphere+ (e.g., HGAP) is used to create contig files for later import into SeqSphere+, the SPEC files can be used to hand over assembly statistics. The SPEC files must be located in the same directory as the contig files, and they must have the same names as the contig files, but with the extension ".spec". They are then automatically processed by SeqSphere+ together with the contig files (fields see below).

SPEC File Format

The SPEC file for a sample must have the same name as the input sequence file (e.g., FASTA or FASTQ) but with the file name extension ".spec". If a specific filenaming is used in a pipeline, the SPEC file may also have the name of the sample ID. Additionally, a SPEC file can also be defined for all sequence files of its directory, if it is named "sequence_specification.spec". If multiple SPEC files are found for a sample, they are merged together.

The content of a SPEC file is plain text (UTF-8) where each line holds a single field and value pair, in the format: field=value (e.g., pf.avg._coverage_(assembled)=111 ). The fields may be in any order. The following fields can be set in a SPEC file and will be imported as Metadata. Dates can be given as yyyy-MM-dd.

Epi Metadata Fields

ef.Sample.alias_id
ef.Sample.isolationDate
ef.Sample.receiptDate
ef.Sample.sample_id_of_collector
ef.Sample.sender
ef.Sample.comment
ef.Sample.modifiedDate
ef.Sample.createdDate
ef.Sample.submittedDate
ef.Sample.downloaded_from
ef.Sample.submitted_to
ef.Source.source_type
ef.Source.source_subtype
ef.Source.host
ef.Source.host_age
ef.Source.host_sex
ef.Source.host_disease
ef.Source.isolation_source
ef.Source.isolation_country
ef.Source.isolation_state
ef.Source.isolation_city
ef.Source.isolation_zip
ef.Source.isolation_lat_long
ef.Source.lat_long_resolution
ef.Source.cluster_outbreak
ef.Source.epi_info
ef.Source.case_id
ef.Source.ecdc_case_id
ef.Characteristic.genus
ef.Characteristic.species
ef.Characteristic.subspecies
ef.Characteristic.strain
ef.Characteristic.genotype
ef.Characteristic.serotype
ef.Characteristic.pathotype
ef.Characteristic.identification_method
ef.Characteristic.identification_kit_vendor
ef.Characteristic.culture_collection
ef.Characteristic.pubmed_id
ef.Characteristic.study
ef.Characteristic.ncbi_accession
ef.Characteristic.experiment_accession
ef.Characteristic.sample_accession
ef.Characteristic.study_accession
ef.Report.report_comment

Procedure Details and Statitistics Fields

pf.library_source
pf.library_strategy
pf.sequencing_protocol
pf.sequencing_vendor
pf.assembly_pre-processing
pf.assembly_type
pf.assembler
pf.assembler_version
pf.assembler_parameters
pf.assembly_post-processing
pf.expected_genome_size_for_downsampling
pf.downsampled_to_coverage
pf.top_species_match
pf.top_species_match_identity
pf.top_species_match_shared-hashes
pf.contamination_check_result
pf.fastqc_per_base_sequence_quality_(forward_reads)
pf.fastqc_per_base_sequence_quality_(reverse_reads)
pf.fastqc_adapter_content
pf.avg._coverage_(unassembled)
pf.avg._coverage_(processed,_unassembled)
pf.avg._read_length_(unassembled)
pf.avg._read_length_(processed,_unassembled)
pf.read_count_(unassembled)
pf.read_count_(processed,_unassembled)
pf.read_base_count_(unassembled)
pf.read_base_count_(processed,_unassembled)
pf.contig_count_(assembled)
pf.n50_(assembled)
pf.read_count_(assembled)
pf.read_fwd_count_(assembled)
pf.read_rev_count_(assembled)
pf.consensus_base_count_(assembled)
pf.approximated_genome_size_(mbases)
pf.max_contig_length_(assembled)
pf.min_contig_length_(assembled)
pf.avg._contig_length_(assembled)
pf.avg._coverage_(assembled)
pf.read_base_count_(assembled)

Paths to Raw Read Files Fields

fl.reads.1
fl.reads.2