By default, a pipeline processes only "new data": All sequence data files are skipped, that belong to a Sample ID of a Sample that exists already in the database, and has task entries for the tasks that are processed by the pipeline.

Therefore, if a sequencing run was repeated using the same Sample ID, it will not be imported if the Sample with the processed sequence data still exists in the database.

One of the following two approaches can be used to process the combined data of the two runs:

Simple Approach

  1. Load the Sample that was created by processing the first run into the workspace. Delete the Task Entry/Entries from the Sample (right-click on the Task Entry node) and save it.
  2. Next be sure to place all FASTQ files into the input directory that is used by the pipeline (e.g. two FASTQs of the first run, and two FASTQs of the second run). All FASTQ files must belong (by file naming settings) to the same Sample ID.
  3. Start the pipeline again.

The pipeline will process all FASTQs files that are found for the Sample in the input directory, and will add new task entries to the Sample.

More Complex Approach: Using Repeater Tag

The second way is more complex and is meant for routine usage where runs with low coverage or bad quality are often repeated. In this case a repeater tag is defined in the pipeline script and must be added to the FASTQ file names of the second run.

As an example there are two FASTQ files of a first sequencing run:

 Sample001_S01_L001_R1_001.fastq.gz
 Sample001_S01_L001_R2_001.fastq.gz

They are processed by a SeqSphere pipeline and the results are saved in Samples to the database.

However, they have too low coverage for reliable typing result. Therefore a second sequencing run needs to be performed.

Then the FASTQs of this second run should have an extension of the Sample ID (e.g. in the Illumina sample sheet), normally separated by an "-" from the real Sample ID. In this example we are using the term "REPEATED", but any text can be used. So the FASTQs that are produced by the second run should have names like

 Sample001-REPEATED_S16_L001_R1_001.fastq.gz
 Sample001-REPEATED_S16_L001_R2_001.fastq.gz

Additionally, two changes in the pipeline script are required:

  • The term "REPEATED" must be entered as repeater tag in the Advanced General Settings section general settings of pipeline script.
  • The file naming must define the position of the tag in the name of the FASTQ files. In this example, this is done by setting the delimiter to "-" and the field position for the Sample Tag(s) to "2".

Finally, if you want to assemble optionally the reads from both runs, then the FASTQs of the first run must be copied manually to the input directory of the pipeline. So in this example the input directory must finally contain the following four FASTQ files

 Sample001_S01_L001_R1_001.fastq.gz
 Sample001_S01_L001_R2_001.fastq.gz
 Sample001-REPEATED_S16_L001_R1_001.fastq.gz
 Sample001-REPEATED_S16_L001_R2_001.fastq.gz

If the pipeline is now started, the existing sequence data of Sample001 will be replaced by a new assembly of all four FASTQ files. The Sample will get the tag "REPEATED", and will therefore not be overwritten if the pipeline is started again with the same input files.


If more than one repetition of a run is necessary, multiple repeater tags must be used. So for example, the six FASTQ files of three runs can be named:

 Sample001_S01_L001_R1_001.fastq.gz
 Sample001_S01_L001_R2_001.fastq.gz
 Sample001-REPEATED_S16_L001_R1_001.fastq.gz
 Sample001-REPEATED_S16_L001_R2_001.fastq.gz
 Sample001-REPEATED2_S05_L001_R1_001.fastq.gz
 Sample001-REPEATED2_S05_L001_R2_001.fastq.gz

and the repeater tag in the pipeline script must then be set to "REPEATED,REPEATED2". Again, all FASTQ files that should be combined must be located in the input directory of the pipeline. The Sample will get an additional tag "REPEATED2".