Preparation
Choose Input FilesUse the menu function A Project and at least one Task Template must be selected. If a single Task Template is selected, the process can be limited to specific targets using the Define Targets checkbox. In the Input Sequence Data section the files with whole genome sequence data can be selected. It is possible to either add files using the Allowed input file formats are FASTA, GenBank, SAM/BAM and ACE-files. To modify the name of the created Sample
If the Sample-ID column contains the name of an existing Sample, the contigs will be added to this Sample. Press the button Show Advanced Settings to show additional parameters:
Click OK to start the "Create Samples from Genome Sequences" process. Preview the ResultsDuring this process, the reference sequences of the selected Task Templates are searched in the input data (using BLAST). If one and only one hit is found, that fulfills the defined thresholds, this is taken as the correct position of the target in the input data. When this process has finished in non-batch mode, a table with all found hits is shown per input data file. Each row in this table represents one target that was searched. The rows that are highlighted red do not fulfill the defined thresholds. Rows for targets that already exist in a Sample with the same name are disabled. To enable overwriting of existing target sequences, mark the checkbox Allow to replace existing targets. The first column of the table shows a checkbox that defines if the found region should be extracted as sequence for the searched target. By default only the targets that fulfill thresholds unambiguously, and that are not already found in an existing Sample are selected. The thresholds can be changed in this preview. The selection marks in the first column are updated automatically. The selection marks can also be changed manually row by row. Press the confirm button at the bottom of the window to create the new Samples, or to extend existing ones. Import the ResultsNow the regions that match to the target reference sequence are extracted from the input data, and added to new or existing Samples. If the input data contains the read information (ACE file), the aligned reads for this are also extracted and imported corresponding to the advanced settings. SAM/BAM filesFor SAM/BAM files (they contain reference-mapped data) a special consensus caller is used. If SAM/BAM files do not contain a reference sequence, a dialog windows opens that allows to specify a FASTA-file with the reference sequence. The sequence names in the FASTA-file must match the names in the SAM/BAM file. Reads with a mapping quality below a given threshold (default 10) are discarded when the SAM/BAM file is read. The threshold can be set in the Preferences. |