SeqSphere+ uses NCBI Blast to perform a local similarity searches. Therefore the following descriptions belong to both, local and NCBI GenBank queries.

Orientation

The orientation relative to the query sequence. If forward, the hit has the same orientation as the query sequence.

Description

The name of the hit or its description in NCBI GenBank.

Accession

The GenBank Accession number (only for NCBI GenBank queries).

Bit Score

The bit score, <math>S'</math>, is derived from the raw alignment score <math>S</math> in which the statistical properties of the scoring system used have been taken into account. By normalizing a raw score using the formula:

<math>S' = \frac{\lambda S - \ln K}{\ln 2}</math>

<math>\lambda</math> and <math>K</math> are statistical parameters dependent upon the scoring system (substitution matrix and gap costs) employed. For determining <math>S'</math>, the more important of these parameters is <math>\lambda</math>. The lambda ratio quoted here is the ratio of <math>\lambda</math> for the given scoring system to that for one using the same substitution scores, but with infinite gap costs. This ratio indicates what proportion of information in an ungapped alignment must be sacrificed in the hope of improving its score through extension using gaps. Empirically was found that the most effective gap costs tend to be those with <math>\lambda</math> ratios in the range 0.8 to 0.9. (cited from NCBI)

Raw Score

The raw alignment score, <math>S</math>, calculated as the sum of substitution and gap scores. Substitution scores are given by a look-up table (see PAM, BLOSUM). Gap scores are typically calculated as the sum of <math>G</math>, the gap opening penalty and <math>L</math>, the gap extension penalty. For a gap of length <math>n</math>, the gap cost would be <math>G+Ln</math>. The choice of gap costs, <math>G</math> and <math>L</math> is empirical, but it is customary to choose a high value for <math>G</math> (10-15) and a low value for <math>L</math> (1-2). (cited from NCBI)

Total Score

The sum of the Bit Scores for all hits that belong to the same subject.

Identity

The extent to which two (nucleotide or amino acid) sequences are invariant.

Ambiguity Identity

An identity calculation where ambiguities are considered with the following matrix:

	A	C	G	T	R	Y	K	M	S	W	B	D	H	V	N
A	1	0	0	0	0.5	0	0	0.5	0	0.5	0	0.33	0.33	0.33	0.25
C	0	1	0	0	0	0.5	0	0.5	0.5	0	0.33	0	0.33	0.33	0.25
G	0	0	1	0	0.5	0	0.5	0	0.5	0	0.33	0.33	0	0.33	0.25
T	0	0	0	1	0	0.5	0.5	0	0	0.5	0.33	0.33	0.33	0	0.25
R	0.5	0	0.5	0	1	0	0.25	0.25	0.25	0.25	0.17	0.33	0.17	0.33	0.25
Y	0	0.5	0	0.5	0	1	0.25	0.25	0.25	0.25	0.33	0.17	0.33	0.17	0.25
K	0	0	0.5	0.5	0.25	0.25	1	0	0.25	0.25	0.33	0.33	0.17	0.17	0.25
M	0.5	0.5	0	0	0.25	0.25	0	1	0.25	0.25	0.17	0.17	0.33	0.33	0.25
S	0	0.5	0.5	0	0.25	0.25	0.25	0.25	1	0	0.33	0.17	0.17	0.33	0.25
W	0.5	0	0	0.5	0.25	0.25	0.25	0.25	0	1	0.17	0.33	0.33	0.17	0.25
B	0	0.33	0.33	0.33	0.17	0.33	0.33	0.17	0.33	0.17	1	0.22	0.22	0.22	0.25
D	0.33	0	0.33	0.33	0.33	0.17	0.33	0.17	0.17	0.33	0.22	1	0.22	0.22	0.25
H	0.33	0.33	0	0.33	0.17	0.33	0.17	0.33	0.17	0.33	0.22	0.22	1	0.22	0.25
V	0.33	0.33	0.33	0	0.33	0.17	0.17	0.33	0.33	0.17	0.22	0.22	0.22	1	0.25
N	0.25	0.25	0.25	0.25	0.25	0.25	0.25	0.25	0.25	0.25	0.25	0.25	0.25	0.25	0.25

Query Aligned

The length of the Query Overlap relative to the length of query sequence.

Hit Aligned

The length of the Hit Overlap relative to the length of hit sequence.

Query Overlap

The length of the query sequence that was aligned to the hit sequence. Calculated as span between begin and end of alignment in query sequences.

Hit Overlap

The length of hit sequence that was aligned to the query sequence. Calculated as span between begin and end of alignment in hit sequences.

Hit Length

The full length of the hit sequence in the database.

Mismatches

Number of mismatches in the hit alignment

Gaps

Number of gaps in the hit alignment

Ambiguities

Number of ambiguities in the overlap of the hit sequence

Bit Identity

E-value

The E-value (Expect value) is a parameter that describes the number of hits one can “expect” to see by chance when searching a database of a particular size. It decreases exponentially with the score (S) that is assigned to a match between two sequences. Essentially, the E-value describes the random background noise that exists for matches between sequences. For example, an E-value of 1 assigned to a hit can be interpreted as meaning that in a database of the current size, one might expect to see one match with a similar score simply by chance. This means that the lower the E-value, or the closer it is to “0”, the higher is the “significance” of the match. However, it is important to note that searches with short sequences can be virtually identical and have relatively high E-value. This is because the calculation of the E-value also takes into account the length of the query sequence. This is because shorter sequences have a high probability of occurring in the database purely by chance. (cited from NCBI)

Contents