Default: Merging Reads

If the sequencing must be repeated for a sample (e.g., because of too low coverage), it is normally desired to assemble the new FASTQ files together with the old ones, that were already processed into a sample database entry with SeqSphere before. When the pipeline detects new FASTQ files for an existing sample, SeqSphere automatically assembles all FASTQ files of the sample together and replaces the existing task entry,

  • if the new FASTQ files have a different file name than the old FASTQ files that were used for the sample, and
  • if the old FASTQ files are still available at the location that was stored in the existing sample.

If a new FASTQ file has the same name as an old FASTQ file of the existing sample, the sample is skipped in the pipeline. If the old FASTQ files of the existing sample cannot accessed any more, the sample is also skipped in the pipeline, and a warning is added to the pipeline logging.

The default mechanism performs a "merging" of the new and the old FASTQ files, i.e., they are all used for the assembly. If the old FASTQ files should not be used any more then the task entries of the sample must be delete manually before starting the pipeline. To delete these, load the existing sample into the workspace, right-click on a task entry node, and choose Remove. Finally save the sample to store the changes. When starting a pipeline then, it will process all FASTQs files that are found for the samples in the input directory and will add new task entries to the sample where the task entries were deleted.

Advanced Approach: Using Repeater Tag

The default mechanism will work for most scenarios. However, for some cases (e.g., automatic genotyping just done with the new FASTQs) it may be useful to define repeater tag(s) in the pipeline script and add this tag to the FASTQ file names of the repeatably sequenced samples (multiple repeater tags can be used if Samples must be re-sequenced multiple times). Once a pipeline script has defined repeater tag(s) the default mechanism is turned off.

As an example there are two FASTQ files of a Sample from the first sequencing run:

 Sample001_S01_L001_R1_001.fastq.gz
 Sample001_S01_L001_R2_001.fastq.gz

They are processed by a SeqSphere pipeline and the results are saved in a Sample to the database.

However, they have too low coverage for a reliable typing result. Therefore the Sample needs to be repeatedly sequenced.

Then the FASTQs of this repeatedly sequenced Sample should have an additional field in the file name (e.g., defined in the Illumina sample sheet), usually separated with the "-" character as Field Delimiter. In this example the term "REPEATED" will be used as Repeater Tag, but any text can be used. So the FASTQs that are produced by the second run should have names like

 Sample001-REPEATED_S16_L001_R1_001.fastq.gz
 Sample001-REPEATED_S16_L001_R2_001.fastq.gz

Additionally, two changes in the pipeline script are required:

  • The term "REPEATED" must be entered as Repeater Tag in the Advanced General Settings section general settings of the pipeline script.
  • The file naming must define the position of the tag in the name of the FASTQ files. In this example this is done by setting the delimiter to "-" and the field position for the Sample Tag(s) to "2". - Thereby, a new assembly is done just from the new FASTQs.

Optionally, if you want to assemble the reads from both runs, then the FASTQs of the first run must be copied manually into the input directory of the pipeline. In this example we aim for a "merged assembly". Therefore, the input directory must contain the following four FASTQ files

 Sample001_S01_L001_R1_001.fastq.gz
 Sample001_S01_L001_R2_001.fastq.gz
 Sample001-REPEATED_S16_L001_R1_001.fastq.gz
 Sample001-REPEATED_S16_L001_R2_001.fastq.gz

If the pipeline is now started, the existing sequence data of Sample001 will be replaced by a new assembly of all four FASTQ files. The Sample will get the tag "REPEATED" and will therefore not be overwritten if the pipeline is started again with the same input files.


If more than one repetition of re-sequencing of a Sample is necessary, multiple repeater tags must be used. So for example, the six FASTQ files of three runs could be named as following:

 Sample001_S01_L001_R1_001.fastq.gz
 Sample001_S01_L001_R2_001.fastq.gz
 Sample001-REPEATED_S16_L001_R1_001.fastq.gz
 Sample001-REPEATED_S16_L001_R2_001.fastq.gz
 Sample001-REPEATED2_S05_L001_R1_001.fastq.gz
 Sample001-REPEATED2_S05_L001_R2_001.fastq.gz

Furthermore, the repeater tag in the pipeline script must then be set to "REPEATED,REPEATED2". Again, all FASTQ files that should be combined must be located in the input directory of the pipeline. After processing the Sample will get the additional tag "REPEATED2".