Blast I/O¶
- Learning Objective
- In this tutorial, you will learn about different the Blast file formats and how to interact with them in SeqAn.
- Difficulty
- Average
- Duration
- 1h30min - 2h30min
- Prerequisite Tutorials
- Sequences, File I/O Overview, Alignment, Pairwise Sequence Alignment
- Other recommended reading
- Basics of Blast Statistics
- Technical requirements
- Full C++11 support required in compiler (GCC \(\ge\) 4.9, Clang \(\ge\) 3.4 or MSVC \(\ge\) 2015)
The Basic local alignment search tool (BLAST) by the NCBI is one of the most widely used tools in Bioinformatics. It supports a variety of formats which, although widely used, are not standardized and partly even declared as “subject to unannounced and undocumented change”. This makes implementing them very hard and is one of the reasons dealing with Blast IO is more difficult than with other file formats.
Furthermore it is important to distinguish between the Blast version written in C (known by its blastall executable) and the C++ version (individual blastn, blastp… executables), also known as BLAST+. The formats and their identifiers changed between versions. The following table gives an overview:
Description | blastall | blast* |
---|---|---|
pairwise | -m 0 | -outfmt 0 |
query-anchored showing identities | -m 1 | -outfmt 1 |
query-anchored no identities | -m 2 | -outfmt 2 |
flat query-anchored, show identities | -m 3 | -outfmt 3 |
flat query-anchored, no identities | -m 4 | -outfmt 4 |
query-anchored no identities and blunt ends | -m 5 | |
flat query-anchored, no identities and blunt ends | -m 6 | |
XML Blast output | -m 7 | -outfmt 5 |
tabular | -m 8 | -outfmt 6 |
tabular with comment lines | -m 9 | -outfmt 7 |
Text ASN.1 | -m 10 | -outfmt 8 |
Binary ASN.1 | -m 11 | -outfmt 9 |
Comma-separated values | -outfmt 10 | |
BLAST archive format (ASN.1) | -outfmt 11 |
The files written by blastall are considered the “legacy”-format in SeqAn. SeqAn has support for the following formats:
Format | read support | write support | based on version |
---|---|---|---|
pairwise | ✓ | Blast-2.2.26+ | |
tabular | ✓ | ✓ | Blast-2.2.26+ |
tabular (legacy) | ✓ | ✓ | Blast-2.2.26 |
tabular w comments | ✓ | ✓ | Blast-2.2.26+ |
tabular w comments (legacy) | ✓ | ✓ | Blast-2.2.26 |
Caution
Please note that Blast-2.2.26+ is not the same as Blast-2.2.26! One is version 2.2.26 of the C++ application suite (BLAST+) and the other is version 2.2.26 of the legacy application suite. There still are software releases for both generations.
There are different program modes in Blast which also influence the file format. In the legacy application suite
these where specified with the -p
parameter, in the BLAST+ suite they each have their own executable.
Program mode | query alphabet | subject alphabet |
---|---|---|
BlastN | nucleotide | nucleotide |
BlastP | protein | protein |
BlastX | translated nucl. | protein |
TBlastN | protein | translated nucl. |
TBlastX | translated nucl. | translated nucl. |
Tabular formats¶
The tabular formats are tab-seperated-value formats (TSV), with twelve columns by default. Each line represents one match (or high scoring pair in Blast terminology). The twelve default columns are:
- Query sequence ID (truncated at first whitespace)
- Subject sequence ID (truncated at first whitespace)
- Percentage of identical positions
- Alignment length
- Number of mismatches
- Number of gap openings
- Start position of alignment on query sequence
- End position of alignment on query sequence
- Start position of alignment on subject sequence
- End position of alignment on subject sequence
- Expect value (length normalized bit score)
- Bit score (statistical significance indicator)
Note
Alignment positions in Blast
- Interval notation: Blast uses 1-based closed intervals for positions, i.e. a match from the 100th position to
the 200th position of a sequence will be shown as
100 200
in the file. SeqAn internally uses 0-based half open intervals, i.e. it starts counting at position 0 and stores the first position behind the sequence as “end”, e.g. position99
and200
for our example. - Reverse strands: For matches found on the reverse complement strand the positions are counted backwards from
the end of the sequence, e.g. a match from the 100th position to the 200th position on a reverse complement strand
of a sequence of length 500 will be shown as
400 300
in the file. - Translation frames: Positions given in the file are always on the original untranslated sequence!
The writeRecord()
function automatically does all of these conversions!
A tabular file could look like this (matches per query are sorted by e-value):
SHAA004TF sp|P0A916|OMPW_SHIFL 50.43 115 49 2 389 733 1 107 1e-26 108
SHAA004TF sp|P0A915|OMPW_ECOLI 50.43 115 49 2 389 733 1 107 1e-26 108
SHAA004TF sp|P17266|OMPW_VIBCH 52.21 113 45 2 410 733 4 112 5e-26 106
SHAA004TF sp|Q8ZP50|OMPW_SALTY 50.43 115 49 2 389 733 1 107 3e-24 101
SHAA004TF sp|Q8Z7E2|OMPW_SALTI 50.43 115 49 2 389 733 1 107 3e-24 101
SHAA004TF sp|P19766|INSB_SHISO 100.00 18 0 0 803 750 114 131 4e-04 43.1
SHAA004TF sp|P19765|INSB_SHIFL 100.00 18 0 0 803 750 114 131 4e-04 43.1
SHAA004TF sp|P03831|INBD_SHIDY 100.00 18 0 0 803 750 114 131 4e-04 43.1
SHAA004TF sp|P59843|INSB_HAEDU 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF31|INSB_ECOLX 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF30|INSB8_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF29|INSB6_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF28|INSB5_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P57998|INSB4_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF25|INSB1_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF27|INSB3_ECOLI 94.44 18 1 0 803 750 150 167 0.004 40.8
SHAA004TF sp|P0CF26|INSB2_ECOLI 94.44 18 1 0 803 750 150 167 0.004 40.8
SHAA004TR sp|Q0HTA5|META_SHESR 100.00 77 0 0 232 2 1 77 3e-42 152
SHAA004TR sp|Q0HGZ8|META_SHESM 100.00 77 0 0 232 2 1 77 3e-42 152
The tabular with comment lines format additionally prefixes every block belonging to one query sequence with comment lines that include the program version, the database name and column labels. The above example would look like this:
# BLASTX 2.2.26+
# Query: SHAA003TR Sample 1 Mate SHAA003TF trimmed_to 17 935
# Database: /tmp/uniprot_sprot.fasta
# 0 hits found
# BLASTX 2.2.26+
# Query: SHAA004TF Sample 1 Mate SHAA004TR trimmed_to 25 828
# Database: /tmp/uniprot_sprot.fasta
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 17 hits found
SHAA004TF sp|P0A916|OMPW_SHIFL 50.43 115 49 2 389 733 1 107 1e-26 108
SHAA004TF sp|P0A915|OMPW_ECOLI 50.43 115 49 2 389 733 1 107 1e-26 108
SHAA004TF sp|P17266|OMPW_VIBCH 52.21 113 45 2 410 733 4 112 5e-26 106
SHAA004TF sp|Q8ZP50|OMPW_SALTY 50.43 115 49 2 389 733 1 107 3e-24 101
SHAA004TF sp|Q8Z7E2|OMPW_SALTI 50.43 115 49 2 389 733 1 107 3e-24 101
SHAA004TF sp|P19766|INSB_SHISO 100.00 18 0 0 803 750 114 131 4e-04 43.1
SHAA004TF sp|P19765|INSB_SHIFL 100.00 18 0 0 803 750 114 131 4e-04 43.1
SHAA004TF sp|P03831|INBD_SHIDY 100.00 18 0 0 803 750 114 131 4e-04 43.1
SHAA004TF sp|P59843|INSB_HAEDU 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF31|INSB_ECOLX 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF30|INSB8_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF29|INSB6_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF28|INSB5_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P57998|INSB4_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF25|INSB1_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF27|INSB3_ECOLI 94.44 18 1 0 803 750 150 167 0.004 40.8
SHAA004TF sp|P0CF26|INSB2_ECOLI 94.44 18 1 0 803 750 150 167 0.004 40.8
# BLASTX 2.2.26+
# Query: SHAA004TR Sample 1 Mate SHAA004TF trimmed_to 20 853
# Database: /tmp/uniprot_sprot.fasta
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 2 hits found
SHAA004TR sp|Q0HTA5|META_SHESR 100.00 77 0 0 232 2 1 77 3e-42 152
SHAA004TR sp|Q0HGZ8|META_SHESM 100.00 77 0 0 232 2 1 77 3e-42 152
As you can see, comment lines are also printed for query sequences which don’t have any matches. The major difference of these formats in BLAST+ vs the legacy application are that the mismatches column used to include the number of gap characters, but it does not in BLAST+. The comments also look slightly different in the tabular with comment lines (legacy) format:
# BLASTX 2.2.26 [Sep-21-2011]
# Query: SHAA003TR Sample 1 Mate SHAA003TF trimmed_to 17 935
# Database: /tmp/uniprot_sprot.fasta
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
# BLASTX 2.2.26 [Sep-21-2011]
# Query: SHAA004TF Sample 1 Mate SHAA004TR trimmed_to 25 828
# Database: /tmp/uniprot_sprot.fasta
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
SHAA004TF sp|P0A916|OMPW_SHIFL 50.43 115 57 2 389 733 1 107 1e-26 108
SHAA004TF sp|P0A915|OMPW_ECOLI 50.43 115 57 2 389 733 1 107 1e-26 108
SHAA004TF sp|P17266|OMPW_VIBCH 52.21 113 49 2 410 733 4 112 5e-26 106
SHAA004TF sp|Q8ZP50|OMPW_SALTY 50.43 115 57 2 389 733 1 107 3e-24 101
SHAA004TF sp|Q8Z7E2|OMPW_SALTI 50.43 115 57 2 389 733 1 107 3e-24 101
SHAA004TF sp|P19766|INSB_SHISO 100.00 18 0 0 803 750 114 131 4e-04 43.1
SHAA004TF sp|P19765|INSB_SHIFL 100.00 18 0 0 803 750 114 131 4e-04 43.1
SHAA004TF sp|P03831|INBD_SHIDY 100.00 18 0 0 803 750 114 131 4e-04 43.1
SHAA004TF sp|P59843|INSB_HAEDU 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF31|INSB_ECOLX 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF30|INSB8_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF29|INSB6_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF28|INSB5_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P57998|INSB4_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF25|INSB1_ECOLI 100.00 18 0 0 803 750 150 167 6e-04 43.1
SHAA004TF sp|P0CF27|INSB3_ECOLI 94.44 18 1 0 803 750 150 167 0.004 40.8
SHAA004TF sp|P0CF26|INSB2_ECOLI 94.44 18 1 0 803 750 150 167 0.004 40.8
# BLASTX 2.2.26 [Sep-21-2011]
# Query: SHAA004TR Sample 1 Mate SHAA004TF trimmed_to 20 853
# Database: /tmp/uniprot_sprot.fasta
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
SHAA004TR sp|Q0HTA5|META_SHESR 100.00 77 0 0 232 2 1 77 3e-42 152
SHAA004TR sp|Q0HGZ8|META_SHESM 100.00 77 0 0 232 2 1 77 3e-42 152
Pairwise format¶
The pairwise format is the default format in Blast. It is more verbose than the tabular formats and very human readable.
This is what the last record from above would look like (the other queries are omitted):
BLASTX 2.2.26+
Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.
Database: /foo/bar/uniprot_sprot.fasta
538,259 sequences; 191,113,170 total letters
[...]
Query= SHAA004TR Sample 1 Mate SHAA004TF trimmed_to 20 853
Length=833
Score E
Sequences producing significant alignments: (Bits) Value
sp|Q0HTA5|META_SHESR Homoserine O-succinyltransferase OS=Shewan... 152 3e-42
sp|Q0HGZ8|META_SHESM Homoserine O-succinyltransferase OS=Shewan... 152 3e-42
> sp|Q0HTA5|META_SHESR Homoserine O-succinyltransferase OS=Shewanella
sp. (strain MR-7) GN=metA PE=3 SV=1
Length=313
Score = 152 bits (385), Expect = 3e-42
Identities = 77/77 (100%), Positives = 77/77 (100%), Gaps = 0/77 (0%)
Frame = -2
Query 232 MPVKIPDHLPAAGILESENIFVMSETRAANQDIRPMKVLILNLMPNKIETETQLLRLLGN 53
MPVKIPDHLPAAGILESENIFVMSETRAANQDIRPMKVLILNLMPNKIETETQLLRLLGN
Sbjct 1 MPVKIPDHLPAAGILESENIFVMSETRAANQDIRPMKVLILNLMPNKIETETQLLRLLGN 60
Query 52 TPLQVDVDLLRIHDKES 2
TPLQVDVDLLRIHDKES
Sbjct 61 TPLQVDVDLLRIHDKES 77
> sp|Q0HGZ8|META_SHESM Homoserine O-succinyltransferase OS=Shewanella
sp. (strain MR-4) GN=metA PE=3 SV=1
Length=313
Score = 152 bits (385), Expect = 3e-42
Identities = 77/77 (100%), Positives = 77/77 (100%), Gaps = 0/77 (0%)
Frame = -2
Query 232 MPVKIPDHLPAAGILESENIFVMSETRAANQDIRPMKVLILNLMPNKIETETQLLRLLGN 53
MPVKIPDHLPAAGILESENIFVMSETRAANQDIRPMKVLILNLMPNKIETETQLLRLLGN
Sbjct 1 MPVKIPDHLPAAGILESENIFVMSETRAANQDIRPMKVLILNLMPNKIETETQLLRLLGN 60
Query 52 TPLQVDVDLLRIHDKES 2
TPLQVDVDLLRIHDKES
Sbjct 61 TPLQVDVDLLRIHDKES 77
Lambda K H
0.318 0.134 0.401
Gapped
Lambda K H
0.267 0.0410 0.140
Effective search space used: 20716695286
[...]
Blast formats in SeqAn¶
There are three blast format related tags in SeqAn:
- BlastReport for the pairwise format with the FormattedFile output specialization BlastReportFileOut.
- BlastTabular for the tabular formats with the FormattedFile output and input specializations BlastTabularFileOut and BlastTabularFileIn.
- BlastTabularLL which provides light-weight, but very basic tabular IO.
The third tag can be used for simple file manipulation, e.g. filtering or column rearrangement, it is not covered in this tutorial (see the dox for a simple example).
- To work with the first two formats you need to understand at least the following data structures:
- BlastRecord: the record covers all BlastMatch es belonging to one query sequence.
- FormattedFile: one of BlastReportFileOut, BlastTabularFileOut and BlastTabularFileIn.
- BlastIOContext: the context of the FormattedFile.
The context contains file-global data like the name of the database and can also be used to read/write certain file format properties, e.g. “with comment lines” or “legacyFormat”.
Caution
Due to the structure of blast tabular files lots of information is repeated in every block of comment lines, e.g.
the database name. Because it is expected that these stay the same they are saved in the context and not the record.
You may still, however, check every time you readRecord()
if you want to make sure.
File reading example¶
Only tabular formats are covered in this example, because no input support is available for the pairwise format.
Copy the contents of the tabular with comment lines example above into a file and give it to the following
program as the only parameter. Please use .m9
as file type extension.
#include <iostream>
#include <seqan/basic.h>
#ifndef STDLIB_VS
#include <seqan/blast.h>
using namespace seqan;
int main(int argc, char **argv)
{
if (argc != 2)
{
std::cerr << "USAGE: FILE_IN\n";
return 0;
}
typedef Gaps<String<AminoAcid>, ArrayGaps> TGaps;
typedef BlastMatch<TGaps, TGaps> TBlastMatch;
typedef BlastRecord<TBlastMatch> TBlastRecord;
<< "Database: "
<< context(file).dbName
<< "\n\n";
return 0;
}
#else
int main()
{
std::cerr << "USAGE: FILE_IN\n";
return 0;
Assignment 1¶
- Objective
- Complete the above example by reading the file according to BlastTabularFileIn. For every record print the query ID, the number of contained matches and the bit-score of the best match.
- Solution
- Top
#include <iostream> #include <seqan/basic.h> #ifndef STDLIB_VS #include <seqan/blast.h> using namespace seqan; int main(int argc, char **argv) { if (argc != 2) { std::cerr << "USAGE: FILE_IN\n"; return 0; } typedef Gaps<String<AminoAcid>, ArrayGaps> TGaps; typedef BlastMatch<TGaps, TGaps> TBlastMatch; typedef BlastRecord<TBlastMatch> TBlastRecord;
- New code
typedef BlastIOContext<> TContext; BlastTabularFileIn<TContext> file(argv[1]); readHeader(file); TBlastRecord record; while (onRecord(file)) { // read the record readRecord(record, file); // print some diagnostics std::cout << "Record of query sequence \"" << record.qId << "\"\n" << "==========================================\n" std::cout << "\n\n"; } std::cout << "\n\n";
- Bottom
<< "Database: " << context(file).dbName << "\n\n"; return 0; } #else int main() { std::cerr << "USAGE: FILE_IN\n"; return 0;
Assignment 2¶
- Objective
- Study the documentation of BlastIOContext. How can you adapt the previous program to check if there were any
problems reading a record? If you have come up with a solution, try to read the file at
tests/blast/defaultfields.m9
. What does the program print and why? - Solution
- Top
#include <iostream> #include <seqan/basic.h> #ifndef STDLIB_VS #include <seqan/blast.h> using namespace seqan; int main(int argc, char **argv) { if (argc != 2) { std::cerr << "USAGE: FILE_IN\n"; return 0; } typedef Gaps<String<AminoAcid>, ArrayGaps> TGaps; typedef BlastMatch<TGaps, TGaps> TBlastMatch; typedef BlastRecord<TBlastMatch> TBlastRecord; typedef BlastIOContext<> TContext; BlastTabularFileIn<TContext> file(argv[1]); readHeader(file); TBlastRecord record; while (onRecord(file)) { // read the record readRecord(record, file); // print some diagnostics std::cout << "Record of query sequence \"" << record.qId << "\"\n" << "==========================================\n"
- New code
for (auto field : context(file).fields) std::cout << BlastMatchField<>::optionLabels[(int)field] << " "; std::cout << "\n\n"; // if there is anything unexpected, tell the user about it if (!empty(context(file).conformancyErrors)) { std::cout << "There were non critical errors when reading the record:\n"; write(std::cout, context(file).conformancyErrors); std::cout << "\n\n"; } if (!empty(context(file).otherLines)) { std::cout << "There were unidentified lines in the comments:\n"; write(std::cout, context(file).otherLines); std::cout << "\n\n"; } std::cout << "\n\n";
The program will print conformancyErrors for the last record, because there is a typo in the file (
Datacase
instead ofDatabase
).- Bottom
<< "Database: " << context(file).dbName << "\n\n"; return 0; } #else int main() { std::cerr << "USAGE: FILE_IN\n"; return 0;
Assignment 3¶
- Objective
Now that you have a basic understanding of BlastIOContext, also print the following information after reading the records:
- file format (with comment lines or without, BLAST+ or legacy?)
- blast program and version
- name of database
Verify that the results are as expected on the files
tests/blast/defaultfields.m8
,tests/blast/defaultfields.m9
andtests/blast/defaultfields_legacy.m9
.- Solution
- Top
#include <iostream> #include <seqan/basic.h> #ifndef STDLIB_VS #include <seqan/blast.h> using namespace seqan; int main(int argc, char **argv) { if (argc != 2) { std::cerr << "USAGE: FILE_IN\n"; return 0; } typedef Gaps<String<AminoAcid>, ArrayGaps> TGaps; typedef BlastMatch<TGaps, TGaps> TBlastMatch; typedef BlastRecord<TBlastMatch> TBlastRecord; typedef BlastIOContext<> TContext; BlastTabularFileIn<TContext> file(argv[1]); readHeader(file); TBlastRecord record; while (onRecord(file)) { // read the record readRecord(record, file); // print some diagnostics std::cout << "Record of query sequence \"" << record.qId << "\"\n" << "==========================================\n" for (auto field : context(file).fields) std::cout << BlastMatchField<>::optionLabels[(int)field] << " "; std::cout << "\n\n"; // if there is anything unexpected, tell the user about it if (!empty(context(file).conformancyErrors)) { std::cout << "There were non critical errors when reading the record:\n"; write(std::cout, context(file).conformancyErrors); std::cout << "\n\n"; } if (!empty(context(file).otherLines)) { std::cout << "There were unidentified lines in the comments:\n"; write(std::cout, context(file).otherLines); std::cout << "\n\n"; } std::cout << "\n\n";
- New code
} readFooter(file); std::cout << "File Format: tabular" << (context(file).tabularSpec == BlastTabularSpec::COMMENTS ? " with coment lines" : "") << '\n' << "Generation: " << (context(file).legacyFormat ? " legacy" : " BLAST+") << '\n' << "Program and version: " << context(file).versionString << '\n'
- Bottom
<< "Database: " << context(file).dbName << "\n\n"; return 0; } #else int main() { std::cerr << "USAGE: FILE_IN\n"; return 0;
Assignment 4¶
As previously mentioned, twelve columns are printed by default.
This can be changed in BLAST+, also by means of the --outfmt
parameter.
A standards compliant file with comment lines and custom column composition can be read
without further configuration in SeqAn.
Tip
Don’t believe it? Look at tests/blast/customfields.m9
, as as you can see the bit score is in the 13th column
(instead of the twelfth). If you run your program on this file, it should still print the correct bit-scores!
- Objective
Read BlastIOContext again focusing on fields and also read BlastMatchField. Now adapt the previous program to print for every record the
optionLabel
of each field used.Verify that the results are as expected on the files
tests/blast/defaultfields.m9
andtests/blast/customfields.m9
.- Solution
- Top
#include <iostream> #include <seqan/basic.h> #ifndef STDLIB_VS #include <seqan/blast.h> using namespace seqan; int main(int argc, char **argv) { if (argc != 2) { std::cerr << "USAGE: FILE_IN\n"; return 0; } typedef Gaps<String<AminoAcid>, ArrayGaps> TGaps; typedef BlastMatch<TGaps, TGaps> TBlastMatch; typedef BlastRecord<TBlastMatch> TBlastRecord; typedef BlastIOContext<> TContext; BlastTabularFileIn<TContext> file(argv[1]); readHeader(file); TBlastRecord record; while (onRecord(file)) { // read the record readRecord(record, file); // print some diagnostics std::cout << "Record of query sequence \"" << record.qId << "\"\n" << "==========================================\n"
- New code
<< "Number of HSPs: " << length(record.matches) << "\n"; if (!empty(record.matches)) std::cout << "BitScore of best HSP: " << front(record.matches).bitScore << "\n"; // print column composition std::cout << "Columns: ";
- Bottom
for (auto field : context(file).fields) std::cout << BlastMatchField<>::optionLabels[(int)field] << " "; std::cout << "\n\n"; // if there is anything unexpected, tell the user about it if (!empty(context(file).conformancyErrors)) { std::cout << "There were non critical errors when reading the record:\n"; write(std::cout, context(file).conformancyErrors); std::cout << "\n\n"; } if (!empty(context(file).otherLines)) { std::cout << "There were unidentified lines in the comments:\n"; write(std::cout, context(file).otherLines); std::cout << "\n\n"; } std::cout << "\n\n"; } readFooter(file); std::cout << "File Format: tabular" << (context(file).tabularSpec == BlastTabularSpec::COMMENTS ? " with coment lines" : "") << '\n' << "Generation: " << (context(file).legacyFormat ? " legacy" : " BLAST+") << '\n' << "Program and version: " << context(file).versionString << '\n' << "Database: " << context(file).dbName << "\n\n"; return 0; } #else int main() { std::cerr << "USAGE: FILE_IN\n"; return 0;
If this was too easy, you can also try the same for tabular files without comment lines!
File writing example¶
The following program stub creates three query sequences and two subject sequences in amino acid alphabet. We will later generate records with matches and print these to disk.
#include <iostream>
#include <seqan/basic.h>
#ifndef STDLIB_VS
#include <seqan/blast.h>
using namespace seqan;
int main(int argc, char ** argv)
{
if (argc != 2)
{
std::cerr << "USAGE: FILE_OUT\n";
return 0;
}
typedef String<AminoAcid> TSequence;
typedef std::string TId;
typedef Gaps<TSequence, ArrayGaps> TGaps;
typedef BlastMatch<TGaps, TGaps> TBlastMatch;
typedef BlastRecord<TBlastMatch> TBlastRecord;
typedef BlastIOContext<Blosum62> TContext;
StringSet<TSequence> queries;
StringSet<TSequence> subjects;
StringSet<TId> qIds;
StringSet<TId> sIds;
appendValue(queries, "VAYAQPRKLCYP");
appendValue(queries, "NAYPRUTEIN");
appendValue(queries, "AVITSFTQ");
appendValue(subjects,
"SSITEEKHIPHKEQDKDAEFLSKEALKTHMTENVLQMDRRAVQDPSTSFLQLLKAKGLLG"
"LPDYEVNLADVNSPGFRKVAYAQTKPRRLCFPNGGTRRGSFIMDTAVVVMVSLRYVNIGK"
"VIFPGATDVSEGEDEFWAGLPQAYGCLATEFLCIHIAIYSWIHVQSSRYDDMNASVIRAK"
"LNLAVITSWTQLIQAEKETI");
appendValue(subjects,
"GATRDSKGNAVITSFTQARLRVYADLLGPYWIILHVIELTGVGNTGQKCTLNHMGTYAVF"
"DLKQPPATNDLGLPKPCFIGFDIQNELAIGTVGHSEAVIAAFTQRDRLEERAESKQSLAR"
"PVISPKLIAEVSTVLESALNQMYSSLGFYRVERAEDYAQPRKLCVVKKKSFNCLNADIWL"
"EYRMEDQKSVPKVFKIMMDD");
appendValue(qIds, "Query_Numero_Uno with args");
appendValue(qIds, "Query_Numero_Dos with args");
appendValue(qIds, "Query_Numero_Tres with args");
appendValue(sIds, "Subject_Numero_Uno");
appendValue(sIds, "Subject_Numero_Dos");
writeFooter(outfile);
return 0;
}
#else
int main()
{
std::cerr << "USAGE: FILE_OUT\n";
return 0;
Assignment 5¶
- Objective
Before we can begin to align, certain properties of the BlastIOContext have to be set. Which ones? Add a block with the necessary initialization to the above code, use blast defaults where possible. Although not strictly required at this point: include a call to
writeHeader()
.Caution
Alignment score computation works slightly different in Blast and in SeqAn, please have a look at BlastScoringScheme. For the task at hand it should suffice to simply use the corresponding
set*()
functions on scoringScheme.- Solution
- Top
#include <iostream> #include <seqan/basic.h> #ifndef STDLIB_VS #include <seqan/blast.h> using namespace seqan; int main(int argc, char ** argv) { if (argc != 2) { std::cerr << "USAGE: FILE_OUT\n"; return 0; } typedef String<AminoAcid> TSequence; typedef std::string TId; typedef Gaps<TSequence, ArrayGaps> TGaps; typedef BlastMatch<TGaps, TGaps> TBlastMatch; typedef BlastRecord<TBlastMatch> TBlastRecord; typedef BlastIOContext<Blosum62> TContext; StringSet<TSequence> queries; StringSet<TSequence> subjects; StringSet<TId> qIds; StringSet<TId> sIds; appendValue(queries, "VAYAQPRKLCYP"); appendValue(queries, "NAYPRUTEIN"); appendValue(queries, "AVITSFTQ"); appendValue(subjects, "SSITEEKHIPHKEQDKDAEFLSKEALKTHMTENVLQMDRRAVQDPSTSFLQLLKAKGLLG" "LPDYEVNLADVNSPGFRKVAYAQTKPRRLCFPNGGTRRGSFIMDTAVVVMVSLRYVNIGK" "VIFPGATDVSEGEDEFWAGLPQAYGCLATEFLCIHIAIYSWIHVQSSRYDDMNASVIRAK" "LNLAVITSWTQLIQAEKETI"); appendValue(subjects, "GATRDSKGNAVITSFTQARLRVYADLLGPYWIILHVIELTGVGNTGQKCTLNHMGTYAVF" "DLKQPPATNDLGLPKPCFIGFDIQNELAIGTVGHSEAVIAAFTQRDRLEERAESKQSLAR" "PVISPKLIAEVSTVLESALNQMYSSLGFYRVERAEDYAQPRKLCVVKKKSFNCLNADIWL" "EYRMEDQKSVPKVFKIMMDD"); appendValue(qIds, "Query_Numero_Uno with args"); appendValue(qIds, "Query_Numero_Dos with args"); appendValue(qIds, "Query_Numero_Tres with args"); appendValue(sIds, "Subject_Numero_Uno"); appendValue(sIds, "Subject_Numero_Dos");
- New code
BlastTabularFileOut<TContext> outfile(argv[1]); String<TBlastRecord> records; // protein vs protein search is BLASTP context(outfile).blastProgram = BlastProgram::BLASTP; // set gap parameters in blast notation setScoreGapOpenBlast(context(outfile).scoringScheme, -11); setScoreGapExtend(context(outfile).scoringScheme, -1); SEQAN_ASSERT(isValid(context(outfile).scoringScheme)); // set the database properties in the context context(outfile).dbName = "The Foo Database"; context(outfile).dbTotalLength = length(concat(subjects));
- Bottom
writeFooter(outfile); return 0; } #else int main() { std::cerr << "USAGE: FILE_OUT\n"; return 0; } #endif
Assignment 6¶
- Objective
- Next create a record for every query sequence, and in each record a match for every query-subject pair. Compute the local alignment for each of those matches. Use the align member of the match object.
- Solution
- Top
#include <iostream> #include <seqan/basic.h> #ifndef STDLIB_VS #include <seqan/blast.h> using namespace seqan; int main(int argc, char ** argv) { if (argc != 2) { std::cerr << "USAGE: FILE_OUT\n"; return 0; } typedef String<AminoAcid> TSequence; typedef std::string TId; typedef Gaps<TSequence, ArrayGaps> TGaps; typedef BlastMatch<TGaps, TGaps> TBlastMatch; typedef BlastRecord<TBlastMatch> TBlastRecord; typedef BlastIOContext<Blosum62> TContext; StringSet<TSequence> queries; StringSet<TSequence> subjects; StringSet<TId> qIds; StringSet<TId> sIds; appendValue(queries, "VAYAQPRKLCYP"); appendValue(queries, "NAYPRUTEIN"); appendValue(queries, "AVITSFTQ"); appendValue(subjects, "SSITEEKHIPHKEQDKDAEFLSKEALKTHMTENVLQMDRRAVQDPSTSFLQLLKAKGLLG" "LPDYEVNLADVNSPGFRKVAYAQTKPRRLCFPNGGTRRGSFIMDTAVVVMVSLRYVNIGK" "VIFPGATDVSEGEDEFWAGLPQAYGCLATEFLCIHIAIYSWIHVQSSRYDDMNASVIRAK" "LNLAVITSWTQLIQAEKETI"); appendValue(subjects, "GATRDSKGNAVITSFTQARLRVYADLLGPYWIILHVIELTGVGNTGQKCTLNHMGTYAVF" "DLKQPPATNDLGLPKPCFIGFDIQNELAIGTVGHSEAVIAAFTQRDRLEERAESKQSLAR" "PVISPKLIAEVSTVLESALNQMYSSLGFYRVERAEDYAQPRKLCVVKKKSFNCLNADIWL" "EYRMEDQKSVPKVFKIMMDD"); appendValue(qIds, "Query_Numero_Uno with args"); appendValue(qIds, "Query_Numero_Dos with args"); appendValue(qIds, "Query_Numero_Tres with args"); appendValue(sIds, "Subject_Numero_Uno"); appendValue(sIds, "Subject_Numero_Dos"); BlastTabularFileOut<TContext> outfile(argv[1]); String<TBlastRecord> records; // protein vs protein search is BLASTP context(outfile).blastProgram = BlastProgram::BLASTP; // set gap parameters in blast notation setScoreGapOpenBlast(context(outfile).scoringScheme, -11); setScoreGapExtend(context(outfile).scoringScheme, -1); SEQAN_ASSERT(isValid(context(outfile).scoringScheme)); // set the database properties in the context context(outfile).dbName = "The Foo Database"; context(outfile).dbTotalLength = length(concat(subjects));
- New Code
context(outfile).dbNumberOfSeqs = length(subjects); writeHeader(outfile); // write file header for (unsigned q = 0; q < length(queries); ++q) { appendValue(records, TBlastRecord(qIds[q])); TBlastRecord & r = back(records); r.qLength = length(queries[q]); for (unsigned s = 0; s < length(subjects); ++s) { appendValue(r.matches, TBlastMatch(sIds[s])); TBlastMatch & m = back(records[q].matches); assignSource(m.alignRow0, queries[q]); assignSource(m.alignRow1, subjects[s]); if (m.eValue > 1) eraseBack(records[q].matches); } writeRecord(outfile, r);
- Bottom
writeFooter(outfile); return 0; } #else int main() { std::cerr << "USAGE: FILE_OUT\n"; return 0; } #endif
Assignment 7¶
- Objective
- Now that you have the align member computed for every match, also save the begin and end positions, as well as the lengths. Blast Output needs to now about the number of gaps, mismatches… of every match, how can they be computed?
- Solution
- Top
#include <iostream> #include <seqan/basic.h> #ifndef STDLIB_VS #include <seqan/blast.h> using namespace seqan; int main(int argc, char ** argv) { if (argc != 2) { std::cerr << "USAGE: FILE_OUT\n"; return 0; } typedef String<AminoAcid> TSequence; typedef std::string TId; typedef Gaps<TSequence, ArrayGaps> TGaps; typedef BlastMatch<TGaps, TGaps> TBlastMatch; typedef BlastRecord<TBlastMatch> TBlastRecord; typedef BlastIOContext<Blosum62> TContext; StringSet<TSequence> queries; StringSet<TSequence> subjects; StringSet<TId> qIds; StringSet<TId> sIds; appendValue(queries, "VAYAQPRKLCYP"); appendValue(queries, "NAYPRUTEIN"); appendValue(queries, "AVITSFTQ"); appendValue(subjects, "SSITEEKHIPHKEQDKDAEFLSKEALKTHMTENVLQMDRRAVQDPSTSFLQLLKAKGLLG" "LPDYEVNLADVNSPGFRKVAYAQTKPRRLCFPNGGTRRGSFIMDTAVVVMVSLRYVNIGK" "VIFPGATDVSEGEDEFWAGLPQAYGCLATEFLCIHIAIYSWIHVQSSRYDDMNASVIRAK" "LNLAVITSWTQLIQAEKETI"); appendValue(subjects, "GATRDSKGNAVITSFTQARLRVYADLLGPYWIILHVIELTGVGNTGQKCTLNHMGTYAVF" "DLKQPPATNDLGLPKPCFIGFDIQNELAIGTVGHSEAVIAAFTQRDRLEERAESKQSLAR" "PVISPKLIAEVSTVLESALNQMYSSLGFYRVERAEDYAQPRKLCVVKKKSFNCLNADIWL" "EYRMEDQKSVPKVFKIMMDD"); appendValue(qIds, "Query_Numero_Uno with args"); appendValue(qIds, "Query_Numero_Dos with args"); appendValue(qIds, "Query_Numero_Tres with args"); appendValue(sIds, "Subject_Numero_Uno"); appendValue(sIds, "Subject_Numero_Dos"); BlastTabularFileOut<TContext> outfile(argv[1]); String<TBlastRecord> records; // protein vs protein search is BLASTP context(outfile).blastProgram = BlastProgram::BLASTP; // set gap parameters in blast notation setScoreGapOpenBlast(context(outfile).scoringScheme, -11); setScoreGapExtend(context(outfile).scoringScheme, -1); SEQAN_ASSERT(isValid(context(outfile).scoringScheme)); // set the database properties in the context context(outfile).dbName = "The Foo Database"; context(outfile).dbTotalLength = length(concat(subjects)); context(outfile).dbNumberOfSeqs = length(subjects); writeHeader(outfile); // write file header for (unsigned q = 0; q < length(queries); ++q) { appendValue(records, TBlastRecord(qIds[q])); TBlastRecord & r = back(records); r.qLength = length(queries[q]); for (unsigned s = 0; s < length(subjects); ++s) { appendValue(r.matches, TBlastMatch(sIds[s])); TBlastMatch & m = back(records[q].matches); assignSource(m.alignRow0, queries[q]); assignSource(m.alignRow1, subjects[s]);
- New Code
localAlignment(m.alignRow0, m.alignRow1, seqanScheme(context(outfile).scoringScheme)); m.qStart = beginPosition(m.alignRow0); m.qEnd = endPosition(m.alignRow0); m.sStart = beginPosition(m.alignRow1); m.sEnd = endPosition(m.alignRow1); m.sLength = length(subjects[s]);
- Bottom
if (m.eValue > 1) eraseBack(records[q].matches); } writeRecord(outfile, r); writeFooter(outfile); return 0; } #else int main() { std::cerr << "USAGE: FILE_OUT\n"; return 0; } #endif
Assignment 8¶
- Objective
- Finally add e-value statistics and print the results to a file:
- compute the bit score and e-value for every match
- discard matches with an e-value greater than 1
- for every record, sort the matches by bit-score
- write each record
- write the footer before exiting the program
- Solution
- Top
#include <iostream> #include <seqan/basic.h> #ifndef STDLIB_VS #include <seqan/blast.h> using namespace seqan; int main(int argc, char ** argv) { if (argc != 2) { std::cerr << "USAGE: FILE_OUT\n"; return 0; } typedef String<AminoAcid> TSequence; typedef std::string TId; typedef Gaps<TSequence, ArrayGaps> TGaps; typedef BlastMatch<TGaps, TGaps> TBlastMatch; typedef BlastRecord<TBlastMatch> TBlastRecord; typedef BlastIOContext<Blosum62> TContext; StringSet<TSequence> queries; StringSet<TSequence> subjects; StringSet<TId> qIds; StringSet<TId> sIds; appendValue(queries, "VAYAQPRKLCYP"); appendValue(queries, "NAYPRUTEIN"); appendValue(queries, "AVITSFTQ"); appendValue(subjects, "SSITEEKHIPHKEQDKDAEFLSKEALKTHMTENVLQMDRRAVQDPSTSFLQLLKAKGLLG" "LPDYEVNLADVNSPGFRKVAYAQTKPRRLCFPNGGTRRGSFIMDTAVVVMVSLRYVNIGK" "VIFPGATDVSEGEDEFWAGLPQAYGCLATEFLCIHIAIYSWIHVQSSRYDDMNASVIRAK" "LNLAVITSWTQLIQAEKETI"); appendValue(subjects, "GATRDSKGNAVITSFTQARLRVYADLLGPYWIILHVIELTGVGNTGQKCTLNHMGTYAVF" "DLKQPPATNDLGLPKPCFIGFDIQNELAIGTVGHSEAVIAAFTQRDRLEERAESKQSLAR" "PVISPKLIAEVSTVLESALNQMYSSLGFYRVERAEDYAQPRKLCVVKKKSFNCLNADIWL" "EYRMEDQKSVPKVFKIMMDD"); appendValue(qIds, "Query_Numero_Uno with args"); appendValue(qIds, "Query_Numero_Dos with args"); appendValue(qIds, "Query_Numero_Tres with args"); appendValue(sIds, "Subject_Numero_Uno"); appendValue(sIds, "Subject_Numero_Dos"); BlastTabularFileOut<TContext> outfile(argv[1]); String<TBlastRecord> records; // protein vs protein search is BLASTP context(outfile).blastProgram = BlastProgram::BLASTP; // set gap parameters in blast notation setScoreGapOpenBlast(context(outfile).scoringScheme, -11); setScoreGapExtend(context(outfile).scoringScheme, -1); SEQAN_ASSERT(isValid(context(outfile).scoringScheme)); // set the database properties in the context context(outfile).dbName = "The Foo Database"; context(outfile).dbTotalLength = length(concat(subjects)); context(outfile).dbNumberOfSeqs = length(subjects); writeHeader(outfile); // write file header for (unsigned q = 0; q < length(queries); ++q) { appendValue(records, TBlastRecord(qIds[q])); TBlastRecord & r = back(records); r.qLength = length(queries[q]); for (unsigned s = 0; s < length(subjects); ++s) { appendValue(r.matches, TBlastMatch(sIds[s])); TBlastMatch & m = back(records[q].matches); assignSource(m.alignRow0, queries[q]); assignSource(m.alignRow1, subjects[s]); localAlignment(m.alignRow0, m.alignRow1, seqanScheme(context(outfile).scoringScheme)); m.qStart = beginPosition(m.alignRow0); m.qEnd = endPosition(m.alignRow0); m.sStart = beginPosition(m.alignRow1); m.sEnd = endPosition(m.alignRow1); m.sLength = length(subjects[s]);
- New Code
computeAlignmentStats(m, context(outfile)); computeBitScore(m, context(outfile)); computeEValue(m, r.qLength, context(outfile));
- Bottom
if (m.eValue > 1) eraseBack(records[q].matches); } r.matches.sort(); // sort by bitscore writeRecord(outfile, r); } writeFooter(outfile); return 0; } #else int main() { std::cerr << "USAGE: FILE_OUT\n"; return 0; } #endif
- Your output file should look like this:
# BLASTP 2.2.26+ [I/O Module of SeqAn-2.0.0, http://www.seqan.de] # Query: Query_Numero_Uno with args # Database: The Foo Database # Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score # 2 hits found Query_Numero_Uno Subject_Numero_Uno 71.43 14 2 1 1 12 79 92 5e-04 23.1 Query_Numero_Uno Subject_Numero_Dos 100.00 8 0 0 3 10 157 164 0.001 22.3 # BLASTP 2.2.26+ [I/O Module of SeqAn-2.0.0, http://www.seqan.de] # Query: Query_Numero_Dos with args # Database: The Foo Database # 0 hits found # BLASTP 2.2.26+ [I/O Module of SeqAn-2.0.0, http://www.seqan.de] # Query: Query_Numero_Tres with args # Database: The Foo Database # Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score # 2 hits found Query_Numero_Tres Subject_Numero_Dos 100.00 8 0 0 1 8 10 17 0.007 18.9 Query_Numero_Tres Subject_Numero_Uno 87.50 8 1 0 1 8 184 191 0.026 16.9 # BLAST processed 3 queries
Assignment 9¶
- Objective
- Up until now you have only printed the tabular with comment lines format. What do you have to do to print without comment lines? In legacy format? What about the pairwise format?
- Solution
- Tabular without comment lines
Add
context(outfile).tabularSpec = BlastTabularSpec::NO_COMMENTS
at l.53. Remember to use.m8
as file extension!The result should look like this:
Query_Numero_Uno Subject_Numero_Uno 71.43 14 2 1 1 12 79 92 5e-04 23.1 Query_Numero_Uno Subject_Numero_Dos 100.00 8 0 0 3 10 157 164 0.001 22.3 Query_Numero_Tres Subject_Numero_Dos 100.00 8 0 0 1 8 10 17 0.007 18.9 Query_Numero_Tres Subject_Numero_Uno 87.50 8 1 0 1 8 184 191 0.026 16.9
- Tabular without comment lines (legacy)
To print in legacy tabular format (with or without comment lines), add
context(outfile).legacyFormat = true
at l.53.The result should look like this(legacy and NO_COMMENTS):
Query_Numero_Uno Subject_Numero_Uno 71.43 14 4 1 1 12 79 92 5e-04 23.1 Query_Numero_Uno Subject_Numero_Dos 100.00 8 0 0 3 10 157 164 0.001 22.3 Query_Numero_Tres Subject_Numero_Dos 100.00 8 0 0 1 8 10 17 0.007 18.9 Query_Numero_Tres Subject_Numero_Uno 87.50 8 1 0 1 8 184 191 0.026 16.9
- Pairwise format
To print in the pairwise format replace l.52 with
BlastReportFileOut<TContext> outfile(argv[1]);
.Remember to use
.m0
as file extension!The result should look like this:
BLASTP 2.2.26+ [I/O Module of SeqAn-2.0.0, http://www.seqan.de] Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402. Reference for SeqAn: Doering, A., D. Weese, T. Rausch, K. Reinert (2008): SeqAn -- An efficient, generic C++ library for sequence analysis. BMC Bioinformatics, 9(1), 11. BioMed Central Ltd. doi:10.1186/1471-2105-9-11 Database: The Foo Database 2 sequences; 400 total letters Query= Query_Numero_Uno with args Length=12 Score E Sequences producing significant alignments: (Bits) Value Subject_Numero_Uno 23 0.0005 Subject_Numero_Dos 22 0.0009 ALIGNMENTS > Subject_Numero_Uno Length=200 Score = 23.1 bits (48), Expect = 0.0005 Identities = 10/14 (71%), Positives = 12/14 (86%), Gaps = 2/14 (14%) Query 1 VAYAQ--PRKLCYP 12 VAYAQ PR+LC+P Sbjct 79 VAYAQTKPRRLCFP 92 > Subject_Numero_Dos Length=200 Score = 22.3 bits (46), Expect = 0.0009 Identities = 8/8 (100%), Positives = 8/8 (100%), Gaps = 0/8 (0%) Query 3 YAQPRKLC 10 YAQPRKLC Sbjct 157 YAQPRKLC 164 Lambda K H 0.267 0.0410 0.1400 Gapped Lambda K H 0.267 0.0410 0.1400 Effective search space used: 4800 Query= Query_Numero_Dos with args Length=10 ***** No hits found ***** Lambda K H 0.267 0.0410 0.1400 Gapped Lambda K H 0.267 0.0410 0.1400 Effective search space used: 4000 Query= Query_Numero_Tres with args Length=8 Score E Sequences producing significant alignments: (Bits) Value Subject_Numero_Dos 18 0.007 Subject_Numero_Uno 16 0.03 ALIGNMENTS > Subject_Numero_Dos Length=200 Score = 18.9 bits (37), Expect = 0.007 Identities = 8/8 (100%), Positives = 8/8 (100%), Gaps = 0/8 (0%) Query 1 AVITSFTQ 8 AVITSFTQ Sbjct 10 AVITSFTQ 17 > Subject_Numero_Uno Length=200 Score = 16.9 bits (32), Expect = 0.03 Identities = 7/8 (88%), Positives = 8/8 (100%), Gaps = 0/8 (0%) Query 1 AVITSFTQ 8 AVITS+TQ Sbjct 184 AVITSWTQ 191 Lambda K H 0.267 0.0410 0.1400 Gapped Lambda K H 0.267 0.0410 0.1400 Effective search space used: 3200 Database: The Foo Database Number of letters in database: 400 Number of sequences in database: 2 Matrix:BLOSUM62 Gap Penalties: Existence: 11, Extension: 1