Blast I/O

Learning Objective

In this tutorial, you will learn about different the Blast file formats and how to interact with them in SeqAn.

Difficulty

Average

Duration

1h30min - 2h30min

Prerequisite Tutorials

Sequences, File I/O Overview, Alignment, Pairwise Sequence Alignment

Other recommended reading

Basics of Blast Statistics

Technical requirements

Full C++11 support required in compiler (GCC \(\ge\) 4.9, Clang \(\ge\) 3.4 or MSVC \(\ge\) 2015)

The Basic local alignment search tool (BLAST) by the NCBI is one of the most widely used tools in Bioinformatics. It supports a variety of formats which, although widely used, are not standardized and partly even declared as “subject to unannounced and undocumented change”. This makes implementing them very hard and is one of the reasons dealing with Blast IO is more difficult than with other file formats.

Furthermore it is important to distinguish between the Blast version written in C (known by its blastall executable) and the C++ version (individual blastn, blastp… executables), also known as BLAST+. The formats and their identifiers changed between versions. The following table gives an overview:

Description

blastall

blast*

pairwise

-m 0

-outfmt 0

query-anchored showing identities

-m 1

-outfmt 1

query-anchored no identities

-m 2

-outfmt 2

flat query-anchored, show identities

-m 3

-outfmt 3

flat query-anchored, no identities

-m 4

-outfmt 4

query-anchored no identities and blunt ends

-m 5

flat query-anchored, no identities and blunt ends

-m 6

XML Blast output

-m 7

-outfmt 5

tabular

-m 8

-outfmt 6

tabular with comment lines

-m 9

-outfmt 7

Text ASN.1

-m 10

-outfmt 8

Binary ASN.1

-m 11

-outfmt 9

Comma-separated values

-outfmt 10

BLAST archive format (ASN.1)

-outfmt 11

The files written by blastall are considered the “legacy”-format in SeqAn. SeqAn has support for the following formats:

Format

read support

write support

based on version

pairwise

Blast-2.2.26+

tabular

Blast-2.2.26+

tabular (legacy)

Blast-2.2.26

tabular w comments

Blast-2.2.26+

tabular w comments (legacy)

Blast-2.2.26

Caution

Please note that Blast-2.2.26+ is not the same as Blast-2.2.26! One is version 2.2.26 of the C++ application suite (BLAST+) and the other is version 2.2.26 of the legacy application suite. There still are software releases for both generations.

There are different program modes in Blast which also influence the file format. In the legacy application suite these where specified with the -p parameter, in the BLAST+ suite they each have their own executable.

Program mode

query alphabet

subject alphabet

BlastN

nucleotide

nucleotide

BlastP

protein

protein

BlastX

translated nucl.

protein

TBlastN

protein

translated nucl.

TBlastX

translated nucl.

translated nucl.

Tabular formats

The tabular formats are tab-seperated-value formats (TSV), with twelve columns by default. Each line represents one match (or high scoring pair in Blast terminology). The twelve default columns are:

  1. Query sequence ID (truncated at first whitespace)

  2. Subject sequence ID (truncated at first whitespace)

  3. Percentage of identical positions

  4. Alignment length

  5. Number of mismatches

  6. Number of gap openings

  7. Start position of alignment on query sequence

  8. End position of alignment on query sequence

  9. Start position of alignment on subject sequence

  10. End position of alignment on subject sequence

  11. Expect value (length normalized bit score)

  12. Bit score (statistical significance indicator)

Note

Alignment positions in Blast

  1. Interval notation: Blast uses 1-based closed intervals for positions, i.e. a match from the 100th position to the 200th position of a sequence will be shown as 100  200 in the file. SeqAn internally uses 0-based half open intervals, i.e. it starts counting at position 0 and stores the first position behind the sequence as “end”, e.g. position 99 and 200 for our example.

  2. Reverse strands: For matches found on the reverse complement strand the positions are counted backwards from the end of the sequence, e.g. a match from the 100th position to the 200th position on a reverse complement strand of a sequence of length 500 will be shown as 400 300 in the file.

  3. Translation frames: Positions given in the file are always on the original untranslated sequence!

The writeRecord() function automatically does all of these conversions!

A tabular file could look like this (matches per query are sorted by e-value):

SHAA004TF	sp|P0A916|OMPW_SHIFL	50.43	115	49	2	389	733	1	107	1e-26	108
SHAA004TF	sp|P0A915|OMPW_ECOLI	50.43	115	49	2	389	733	1	107	1e-26	108
SHAA004TF	sp|P17266|OMPW_VIBCH	52.21	113	45	2	410	733	4	112	5e-26	106
SHAA004TF	sp|Q8ZP50|OMPW_SALTY	50.43	115	49	2	389	733	1	107	3e-24	101
SHAA004TF	sp|Q8Z7E2|OMPW_SALTI	50.43	115	49	2	389	733	1	107	3e-24	101
SHAA004TF	sp|P19766|INSB_SHISO	100.00	18	0	0	803	750	114	131	4e-04	43.1
SHAA004TF	sp|P19765|INSB_SHIFL	100.00	18	0	0	803	750	114	131	4e-04	43.1
SHAA004TF	sp|P03831|INBD_SHIDY	100.00	18	0	0	803	750	114	131	4e-04	43.1
SHAA004TF	sp|P59843|INSB_HAEDU	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF31|INSB_ECOLX	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF30|INSB8_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF29|INSB6_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF28|INSB5_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P57998|INSB4_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF25|INSB1_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF27|INSB3_ECOLI	94.44	18	1	0	803	750	150	167	0.004	40.8
SHAA004TF	sp|P0CF26|INSB2_ECOLI	94.44	18	1	0	803	750	150	167	0.004	40.8
SHAA004TR	sp|Q0HTA5|META_SHESR	100.00	77	0	0	232	2	1	77	3e-42	152
SHAA004TR	sp|Q0HGZ8|META_SHESM	100.00	77	0	0	232	2	1	77	3e-42	152

The tabular with comment lines format additionally prefixes every block belonging to one query sequence with comment lines that include the program version, the database name and column labels. The above example would look like this:

# BLASTX 2.2.26+
# Query: SHAA003TR  Sample 1 Mate SHAA003TF trimmed_to 17 935
# Database: /tmp/uniprot_sprot.fasta
# 0 hits found
# BLASTX 2.2.26+
# Query: SHAA004TF  Sample 1 Mate SHAA004TR trimmed_to 25 828
# Database: /tmp/uniprot_sprot.fasta
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 17 hits found
SHAA004TF	sp|P0A916|OMPW_SHIFL	50.43	115	49	2	389	733	1	107	1e-26	108
SHAA004TF	sp|P0A915|OMPW_ECOLI	50.43	115	49	2	389	733	1	107	1e-26	108
SHAA004TF	sp|P17266|OMPW_VIBCH	52.21	113	45	2	410	733	4	112	5e-26	106
SHAA004TF	sp|Q8ZP50|OMPW_SALTY	50.43	115	49	2	389	733	1	107	3e-24	101
SHAA004TF	sp|Q8Z7E2|OMPW_SALTI	50.43	115	49	2	389	733	1	107	3e-24	101
SHAA004TF	sp|P19766|INSB_SHISO	100.00	18	0	0	803	750	114	131	4e-04	43.1
SHAA004TF	sp|P19765|INSB_SHIFL	100.00	18	0	0	803	750	114	131	4e-04	43.1
SHAA004TF	sp|P03831|INBD_SHIDY	100.00	18	0	0	803	750	114	131	4e-04	43.1
SHAA004TF	sp|P59843|INSB_HAEDU	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF31|INSB_ECOLX	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF30|INSB8_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF29|INSB6_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF28|INSB5_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P57998|INSB4_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF25|INSB1_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF27|INSB3_ECOLI	94.44	18	1	0	803	750	150	167	0.004	40.8
SHAA004TF	sp|P0CF26|INSB2_ECOLI	94.44	18	1	0	803	750	150	167	0.004	40.8
# BLASTX 2.2.26+
# Query: SHAA004TR  Sample 1 Mate SHAA004TF trimmed_to 20 853
# Database: /tmp/uniprot_sprot.fasta
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 2 hits found
SHAA004TR	sp|Q0HTA5|META_SHESR	100.00	77	0	0	232	2	1	77	3e-42	152
SHAA004TR	sp|Q0HGZ8|META_SHESM	100.00	77	0	0	232	2	1	77	3e-42	152

As you can see, comment lines are also printed for query sequences which don’t have any matches. The major difference of these formats in BLAST+ vs the legacy application are that the mismatches column used to include the number of gap characters, but it does not in BLAST+. The comments also look slightly different in the tabular with comment lines (legacy) format:

# BLASTX 2.2.26 [Sep-21-2011]
# Query: SHAA003TR  Sample 1 Mate SHAA003TF trimmed_to 17 935
# Database: /tmp/uniprot_sprot.fasta
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
# BLASTX 2.2.26 [Sep-21-2011]
# Query: SHAA004TF  Sample 1 Mate SHAA004TR trimmed_to 25 828
# Database: /tmp/uniprot_sprot.fasta
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
SHAA004TF	sp|P0A916|OMPW_SHIFL	50.43	115	57	2	389	733	1	107	1e-26	108
SHAA004TF	sp|P0A915|OMPW_ECOLI	50.43	115	57	2	389	733	1	107	1e-26	108
SHAA004TF	sp|P17266|OMPW_VIBCH	52.21	113	49	2	410	733	4	112	5e-26	106
SHAA004TF	sp|Q8ZP50|OMPW_SALTY	50.43	115	57	2	389	733	1	107	3e-24	101
SHAA004TF	sp|Q8Z7E2|OMPW_SALTI	50.43	115	57	2	389	733	1	107	3e-24	101
SHAA004TF	sp|P19766|INSB_SHISO	100.00	18	0	0	803	750	114	131	4e-04	43.1
SHAA004TF	sp|P19765|INSB_SHIFL	100.00	18	0	0	803	750	114	131	4e-04	43.1
SHAA004TF	sp|P03831|INBD_SHIDY	100.00	18	0	0	803	750	114	131	4e-04	43.1
SHAA004TF	sp|P59843|INSB_HAEDU	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF31|INSB_ECOLX	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF30|INSB8_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF29|INSB6_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF28|INSB5_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P57998|INSB4_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF25|INSB1_ECOLI	100.00	18	0	0	803	750	150	167	6e-04	43.1
SHAA004TF	sp|P0CF27|INSB3_ECOLI	94.44	18	1	0	803	750	150	167	0.004	40.8
SHAA004TF	sp|P0CF26|INSB2_ECOLI	94.44	18	1	0	803	750	150	167	0.004	40.8
# BLASTX 2.2.26 [Sep-21-2011]
# Query: SHAA004TR  Sample 1 Mate SHAA004TF trimmed_to 20 853
# Database: /tmp/uniprot_sprot.fasta
# Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
SHAA004TR	sp|Q0HTA5|META_SHESR	100.00	77	0	0	232	2	1	77	3e-42	152
SHAA004TR	sp|Q0HGZ8|META_SHESM	100.00	77	0	0	232	2	1	77	3e-42	152

Pairwise format

The pairwise format is the default format in Blast. It is more verbose than the tabular formats and very human readable.

This is what the last record from above would look like (the other queries are omitted):

BLASTX 2.2.26+


Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.



Database: /foo/bar/uniprot_sprot.fasta
           538,259 sequences; 191,113,170 total letters

[...]

Query= SHAA004TR  Sample 1 Mate SHAA004TF trimmed_to 20 853

Length=833
                                                                      Score     E
Sequences producing significant alignments:                          (Bits)  Value

  sp|Q0HTA5|META_SHESR Homoserine O-succinyltransferase OS=Shewan...   152    3e-42
  sp|Q0HGZ8|META_SHESM Homoserine O-succinyltransferase OS=Shewan...   152    3e-42


> sp|Q0HTA5|META_SHESR Homoserine O-succinyltransferase OS=Shewanella 
sp. (strain MR-7) GN=metA PE=3 SV=1
Length=313

 Score =  152 bits (385),  Expect = 3e-42
 Identities = 77/77 (100%), Positives = 77/77 (100%), Gaps = 0/77 (0%)
 Frame = -2

Query  232  MPVKIPDHLPAAGILESENIFVMSETRAANQDIRPMKVLILNLMPNKIETETQLLRLLGN  53
            MPVKIPDHLPAAGILESENIFVMSETRAANQDIRPMKVLILNLMPNKIETETQLLRLLGN
Sbjct  1    MPVKIPDHLPAAGILESENIFVMSETRAANQDIRPMKVLILNLMPNKIETETQLLRLLGN  60

Query  52   TPLQVDVDLLRIHDKES  2
            TPLQVDVDLLRIHDKES
Sbjct  61   TPLQVDVDLLRIHDKES  77


> sp|Q0HGZ8|META_SHESM Homoserine O-succinyltransferase OS=Shewanella 
sp. (strain MR-4) GN=metA PE=3 SV=1
Length=313

 Score =  152 bits (385),  Expect = 3e-42
 Identities = 77/77 (100%), Positives = 77/77 (100%), Gaps = 0/77 (0%)
 Frame = -2

Query  232  MPVKIPDHLPAAGILESENIFVMSETRAANQDIRPMKVLILNLMPNKIETETQLLRLLGN  53
            MPVKIPDHLPAAGILESENIFVMSETRAANQDIRPMKVLILNLMPNKIETETQLLRLLGN
Sbjct  1    MPVKIPDHLPAAGILESENIFVMSETRAANQDIRPMKVLILNLMPNKIETETQLLRLLGN  60

Query  52   TPLQVDVDLLRIHDKES  2
            TPLQVDVDLLRIHDKES
Sbjct  61   TPLQVDVDLLRIHDKES  77



Lambda     K      H
   0.318    0.134    0.401 

Gapped
Lambda     K      H
   0.267   0.0410    0.140 

Effective search space used: 20716695286


[...]

Blast formats in SeqAn

There are three blast format related tags in SeqAn:

  1. BlastReport for the pairwise format with the FormattedFile output specialization BlastReportFileOut.

  2. BlastTabular for the tabular formats with the FormattedFile output and input specializations BlastTabularFileOut and BlastTabularFileIn.

  3. BlastTabularLL which provides light-weight, but very basic tabular IO.

The third tag can be used for simple file manipulation, e.g. filtering or column rearrangement, it is not covered in this tutorial (see the dox for a simple example).

To work with the first two formats you need to understand at least the following data structures:

The context contains file-global data like the name of the database and can also be used to read/write certain file format properties, e.g. “with comment lines” or “legacyFormat”.

Caution

Due to the structure of blast tabular files lots of information is repeated in every block of comment lines, e.g. the database name. Because it is expected that these stay the same they are saved in the context and not the record. You may still, however, check every time you readRecord() if you want to make sure.

File reading example

Only tabular formats are covered in this example, because no input support is available for the pairwise format.

Copy the contents of the tabular with comment lines example above into a file and give it to the following program as the only parameter. Please use .m9 as file type extension.

#include <iostream>
#include <seqan/basic.h>
#ifndef STDLIB_VS
#include <seqan/blast.h>

using namespace seqan2;

int main(int argc, char **argv)
{
    if (argc != 2)
    {
      std::cerr << "USAGE: FILE_IN\n";
      return 0;
    }

    typedef Gaps<String<AminoAcid>, ArrayGaps> TGaps;
    typedef BlastMatch<TGaps, TGaps> TBlastMatch;
    typedef BlastRecord<TBlastMatch> TBlastRecord;
              << "Database: "
              << context(file).dbName
              << "\n\n";

    return 0;
}
#else
int main()
{
    std::cerr << "USAGE: FILE_IN\n";
    return 0;

Assignment 1

Objective

Complete the above example by reading the file according to BlastTabularFileIn. For every record print the query ID, the number of contained matches and the bit-score of the best match.

Solution
Top
#include <iostream>
#include <seqan/basic.h>
#ifndef STDLIB_VS
#include <seqan/blast.h>

using namespace seqan2;

int main(int argc, char **argv)
{
    if (argc != 2)
    {
      std::cerr << "USAGE: FILE_IN\n";
      return 0;
    }

    typedef Gaps<String<AminoAcid>, ArrayGaps> TGaps;
    typedef BlastMatch<TGaps, TGaps> TBlastMatch;
    typedef BlastRecord<TBlastMatch> TBlastRecord;
New code
    typedef BlastIOContext<> TContext;

    BlastTabularFileIn<TContext> file(argv[1]);

    readHeader(file);

    TBlastRecord record;

    while (onRecord(file))
    {
        // read the record
        readRecord(record, file);

        // print some diagnostics
        std::cout << "Record of query sequence \"" << record.qId << "\"\n"
                  << "==========================================\n"
            std::cout << "\n\n";
        }

        std::cout << "\n\n";
Bottom
              << "Database: "
              << context(file).dbName
              << "\n\n";

    return 0;
}
#else
int main()
{
    std::cerr << "USAGE: FILE_IN\n";
    return 0;

Assignment 2

Objective

Study the documentation of BlastIOContext. How can you adapt the previous program to check if there were any problems reading a record? If you have come up with a solution, try to read the file at tests/blast/defaultfields.m9. What does the program print and why?

Solution
Top
#include <iostream>
#include <seqan/basic.h>
#ifndef STDLIB_VS
#include <seqan/blast.h>

using namespace seqan2;

int main(int argc, char **argv)
{
    if (argc != 2)
    {
      std::cerr << "USAGE: FILE_IN\n";
      return 0;
    }

    typedef Gaps<String<AminoAcid>, ArrayGaps> TGaps;
    typedef BlastMatch<TGaps, TGaps> TBlastMatch;
    typedef BlastRecord<TBlastMatch> TBlastRecord;
    typedef BlastIOContext<> TContext;

    BlastTabularFileIn<TContext> file(argv[1]);

    readHeader(file);

    TBlastRecord record;

    while (onRecord(file))
    {
        // read the record
        readRecord(record, file);

        // print some diagnostics
        std::cout << "Record of query sequence \"" << record.qId << "\"\n"
                  << "==========================================\n"
New code
        for (auto field : context(file).fields)
            std::cout << BlastMatchField<>::optionLabels[(int)field] << " ";
        std::cout << "\n\n";

        // if there is anything unexpected, tell the user about it
        if (!empty(context(file).conformancyErrors))
        {
            std::cout << "There were non critical errors when reading the record:\n";
            write(std::cout, context(file).conformancyErrors);
            std::cout << "\n\n";
        }

        if (!empty(context(file).otherLines))
        {
            std::cout << "There were unidentified lines in the comments:\n";
            write(std::cout, context(file).otherLines);
            std::cout << "\n\n";
        }

        std::cout << "\n\n";

The program will print conformancyErrors for the last record, because there is a typo in the file ( Datacase instead of Database ).

Bottom
              << "Database: "
              << context(file).dbName
              << "\n\n";

    return 0;
}
#else
int main()
{
    std::cerr << "USAGE: FILE_IN\n";
    return 0;

Assignment 3

Objective

Now that you have a basic understanding of BlastIOContext, also print the following information after reading the records:

  • file format (with comment lines or without, BLAST+ or legacy?)

  • blast program and version

  • name of database

Verify that the results are as expected on the files tests/blast/defaultfields.m8, tests/blast/defaultfields.m9 and tests/blast/defaultfields_legacy.m9.

Solution
Top
#include <iostream>
#include <seqan/basic.h>
#ifndef STDLIB_VS
#include <seqan/blast.h>

using namespace seqan2;

int main(int argc, char **argv)
{
    if (argc != 2)
    {
      std::cerr << "USAGE: FILE_IN\n";
      return 0;
    }

    typedef Gaps<String<AminoAcid>, ArrayGaps> TGaps;
    typedef BlastMatch<TGaps, TGaps> TBlastMatch;
    typedef BlastRecord<TBlastMatch> TBlastRecord;
    typedef BlastIOContext<> TContext;

    BlastTabularFileIn<TContext> file(argv[1]);

    readHeader(file);

    TBlastRecord record;

    while (onRecord(file))
    {
        // read the record
        readRecord(record, file);

        // print some diagnostics
        std::cout << "Record of query sequence \"" << record.qId << "\"\n"
                  << "==========================================\n"
        for (auto field : context(file).fields)
            std::cout << BlastMatchField<>::optionLabels[(int)field] << " ";
        std::cout << "\n\n";

        // if there is anything unexpected, tell the user about it
        if (!empty(context(file).conformancyErrors))
        {
            std::cout << "There were non critical errors when reading the record:\n";
            write(std::cout, context(file).conformancyErrors);
            std::cout << "\n\n";
        }

        if (!empty(context(file).otherLines))
        {
            std::cout << "There were unidentified lines in the comments:\n";
            write(std::cout, context(file).otherLines);
            std::cout << "\n\n";
        }

        std::cout << "\n\n";
New code
    }

    readFooter(file);

    std::cout << "File Format: tabular"
              << (context(file).tabularSpec == BlastTabularSpec::COMMENTS ? " with coment lines" : "")
              << '\n'
              << "Generation: "
              << (context(file).legacyFormat ? " legacy" : " BLAST+")
              << '\n'
              << "Program and version: "
              << context(file).versionString
              << '\n'
Bottom
              << "Database: "
              << context(file).dbName
              << "\n\n";

    return 0;
}
#else
int main()
{
    std::cerr << "USAGE: FILE_IN\n";
    return 0;

Assignment 4

As previously mentioned, twelve columns are printed by default. This can be changed in BLAST+, also by means of the --outfmt parameter. A standards compliant file with comment lines and custom column composition can be read without further configuration in SeqAn.

Tip

Don’t believe it? Look at tests/blast/customfields.m9, as as you can see the bit score is in the 13th column (instead of the twelfth). If you run your program on this file, it should still print the correct bit-scores!

Objective

Read BlastIOContext again focusing on fields and also read BlastMatchField. Now adapt the previous program to print for every record the optionLabel of each field used.

Verify that the results are as expected on the files tests/blast/defaultfields.m9 and tests/blast/customfields.m9.

Solution
Top
#include <iostream>
#include <seqan/basic.h>
#ifndef STDLIB_VS
#include <seqan/blast.h>

using namespace seqan2;

int main(int argc, char **argv)
{
    if (argc != 2)
    {
      std::cerr << "USAGE: FILE_IN\n";
      return 0;
    }

    typedef Gaps<String<AminoAcid>, ArrayGaps> TGaps;
    typedef BlastMatch<TGaps, TGaps> TBlastMatch;
    typedef BlastRecord<TBlastMatch> TBlastRecord;
    typedef BlastIOContext<> TContext;

    BlastTabularFileIn<TContext> file(argv[1]);

    readHeader(file);

    TBlastRecord record;

    while (onRecord(file))
    {
        // read the record
        readRecord(record, file);

        // print some diagnostics
        std::cout << "Record of query sequence \"" << record.qId << "\"\n"
                  << "==========================================\n"
New code
                  << "Number of HSPs: " << length(record.matches) << "\n";
        if  (!empty(record.matches))
            std::cout << "BitScore of best HSP: " << front(record.matches).bitScore << "\n";

        // print column composition
        std::cout << "Columns: ";
Bottom
        for (auto field : context(file).fields)
            std::cout << BlastMatchField<>::optionLabels[(int)field] << " ";
        std::cout << "\n\n";

        // if there is anything unexpected, tell the user about it
        if (!empty(context(file).conformancyErrors))
        {
            std::cout << "There were non critical errors when reading the record:\n";
            write(std::cout, context(file).conformancyErrors);
            std::cout << "\n\n";
        }

        if (!empty(context(file).otherLines))
        {
            std::cout << "There were unidentified lines in the comments:\n";
            write(std::cout, context(file).otherLines);
            std::cout << "\n\n";
        }

        std::cout << "\n\n";
    }

    readFooter(file);

    std::cout << "File Format: tabular"
              << (context(file).tabularSpec == BlastTabularSpec::COMMENTS ? " with coment lines" : "")
              << '\n'
              << "Generation: "
              << (context(file).legacyFormat ? " legacy" : " BLAST+")
              << '\n'
              << "Program and version: "
              << context(file).versionString
              << '\n'
              << "Database: "
              << context(file).dbName
              << "\n\n";

    return 0;
}
#else
int main()
{
    std::cerr << "USAGE: FILE_IN\n";
    return 0;

If this was too easy, you can also try the same for tabular files without comment lines!

File writing example

The following program stub creates three query sequences and two subject sequences in amino acid alphabet. We will later generate records with matches and print these to disk.

#include <iostream>
#include <seqan/basic.h>
#ifndef STDLIB_VS
#include <seqan/blast.h>

using namespace seqan2;

int main(int argc, char ** argv)
{
    if (argc != 2)
    {
      std::cerr << "USAGE: FILE_OUT\n";
      return 0;
    }

    typedef String<AminoAcid>               TSequence;
    typedef std::string                     TId;
    typedef Gaps<TSequence, ArrayGaps>      TGaps;
    typedef BlastMatch<TGaps, TGaps>        TBlastMatch;
    typedef BlastRecord<TBlastMatch>        TBlastRecord;
    typedef BlastIOContext<Blosum62>        TContext;

    StringSet<TSequence>    queries;
    StringSet<TSequence>    subjects;
    StringSet<TId>          qIds;
    StringSet<TId>          sIds;

    appendValue(queries, "VAYAQPRKLCYP");
    appendValue(queries, "NAYPRUTEIN");
    appendValue(queries, "AVITSFTQ");

    appendValue(subjects,
        "SSITEEKHIPHKEQDKDAEFLSKEALKTHMTENVLQMDRRAVQDPSTSFLQLLKAKGLLG"
        "LPDYEVNLADVNSPGFRKVAYAQTKPRRLCFPNGGTRRGSFIMDTAVVVMVSLRYVNIGK"
        "VIFPGATDVSEGEDEFWAGLPQAYGCLATEFLCIHIAIYSWIHVQSSRYDDMNASVIRAK"
        "LNLAVITSWTQLIQAEKETI");

    appendValue(subjects,
        "GATRDSKGNAVITSFTQARLRVYADLLGPYWIILHVIELTGVGNTGQKCTLNHMGTYAVF"
        "DLKQPPATNDLGLPKPCFIGFDIQNELAIGTVGHSEAVIAAFTQRDRLEERAESKQSLAR"
        "PVISPKLIAEVSTVLESALNQMYSSLGFYRVERAEDYAQPRKLCVVKKKSFNCLNADIWL"
        "EYRMEDQKSVPKVFKIMMDD");

    appendValue(qIds, "Query_Numero_Uno with args");
    appendValue(qIds, "Query_Numero_Dos with args");
    appendValue(qIds, "Query_Numero_Tres with args");

    appendValue(sIds, "Subject_Numero_Uno");
    appendValue(sIds, "Subject_Numero_Dos");

    writeFooter(outfile);

    return 0;
}
#else
int main()
{
    std::cerr << "USAGE: FILE_OUT\n";
    return 0;

Assignment 5

Objective

Before we can begin to align, certain properties of the BlastIOContext have to be set. Which ones? Add a block with the necessary initialization to the above code, use blast defaults where possible. Although not strictly required at this point: include a call to writeHeader().

Caution

Alignment score computation works slightly different in Blast and in SeqAn, please have a look at BlastScoringScheme. For the task at hand it should suffice to simply use the corresponding set*() functions on scoringScheme.

Solution
Top
#include <iostream>
#include <seqan/basic.h>
#ifndef STDLIB_VS
#include <seqan/blast.h>

using namespace seqan2;

int main(int argc, char ** argv)
{
    if (argc != 2)
    {
      std::cerr << "USAGE: FILE_OUT\n";
      return 0;
    }

    typedef String<AminoAcid>               TSequence;
    typedef std::string                     TId;
    typedef Gaps<TSequence, ArrayGaps>      TGaps;
    typedef BlastMatch<TGaps, TGaps>        TBlastMatch;
    typedef BlastRecord<TBlastMatch>        TBlastRecord;
    typedef BlastIOContext<Blosum62>        TContext;

    StringSet<TSequence>    queries;
    StringSet<TSequence>    subjects;
    StringSet<TId>          qIds;
    StringSet<TId>          sIds;

    appendValue(queries, "VAYAQPRKLCYP");
    appendValue(queries, "NAYPRUTEIN");
    appendValue(queries, "AVITSFTQ");

    appendValue(subjects,
        "SSITEEKHIPHKEQDKDAEFLSKEALKTHMTENVLQMDRRAVQDPSTSFLQLLKAKGLLG"
        "LPDYEVNLADVNSPGFRKVAYAQTKPRRLCFPNGGTRRGSFIMDTAVVVMVSLRYVNIGK"
        "VIFPGATDVSEGEDEFWAGLPQAYGCLATEFLCIHIAIYSWIHVQSSRYDDMNASVIRAK"
        "LNLAVITSWTQLIQAEKETI");

    appendValue(subjects,
        "GATRDSKGNAVITSFTQARLRVYADLLGPYWIILHVIELTGVGNTGQKCTLNHMGTYAVF"
        "DLKQPPATNDLGLPKPCFIGFDIQNELAIGTVGHSEAVIAAFTQRDRLEERAESKQSLAR"
        "PVISPKLIAEVSTVLESALNQMYSSLGFYRVERAEDYAQPRKLCVVKKKSFNCLNADIWL"
        "EYRMEDQKSVPKVFKIMMDD");

    appendValue(qIds, "Query_Numero_Uno with args");
    appendValue(qIds, "Query_Numero_Dos with args");
    appendValue(qIds, "Query_Numero_Tres with args");

    appendValue(sIds, "Subject_Numero_Uno");
    appendValue(sIds, "Subject_Numero_Dos");
New code
    BlastTabularFileOut<TContext> outfile(argv[1]);
    String<TBlastRecord> records;

    // protein vs protein search is BLASTP
    context(outfile).blastProgram = BlastProgram::BLASTP;

    // set gap parameters in blast notation
    setScoreGapOpenBlast(context(outfile).scoringScheme, -11);
    setScoreGapExtend(context(outfile).scoringScheme, -1);
    SEQAN_ASSERT(isValid(context(outfile).scoringScheme));

    // set the database properties in the context
    context(outfile).dbName = "The Foo Database";
    context(outfile).dbTotalLength = length(concat(subjects));
Bottom
    writeFooter(outfile);

    return 0;
}
#else
int main()
{
    std::cerr << "USAGE: FILE_OUT\n";
    return 0;
}
#endif

Assignment 6

Objective

Next create a record for every query sequence, and in each record a match for every query-subject pair. Compute the local alignment for each of those matches. Use the align member of the match object.

Solution
Top
#include <iostream>
#include <seqan/basic.h>
#ifndef STDLIB_VS
#include <seqan/blast.h>

using namespace seqan2;

int main(int argc, char ** argv)
{
    if (argc != 2)
    {
      std::cerr << "USAGE: FILE_OUT\n";
      return 0;
    }

    typedef String<AminoAcid>               TSequence;
    typedef std::string                     TId;
    typedef Gaps<TSequence, ArrayGaps>      TGaps;
    typedef BlastMatch<TGaps, TGaps>        TBlastMatch;
    typedef BlastRecord<TBlastMatch>        TBlastRecord;
    typedef BlastIOContext<Blosum62>        TContext;

    StringSet<TSequence>    queries;
    StringSet<TSequence>    subjects;
    StringSet<TId>          qIds;
    StringSet<TId>          sIds;

    appendValue(queries, "VAYAQPRKLCYP");
    appendValue(queries, "NAYPRUTEIN");
    appendValue(queries, "AVITSFTQ");

    appendValue(subjects,
        "SSITEEKHIPHKEQDKDAEFLSKEALKTHMTENVLQMDRRAVQDPSTSFLQLLKAKGLLG"
        "LPDYEVNLADVNSPGFRKVAYAQTKPRRLCFPNGGTRRGSFIMDTAVVVMVSLRYVNIGK"
        "VIFPGATDVSEGEDEFWAGLPQAYGCLATEFLCIHIAIYSWIHVQSSRYDDMNASVIRAK"
        "LNLAVITSWTQLIQAEKETI");

    appendValue(subjects,
        "GATRDSKGNAVITSFTQARLRVYADLLGPYWIILHVIELTGVGNTGQKCTLNHMGTYAVF"
        "DLKQPPATNDLGLPKPCFIGFDIQNELAIGTVGHSEAVIAAFTQRDRLEERAESKQSLAR"
        "PVISPKLIAEVSTVLESALNQMYSSLGFYRVERAEDYAQPRKLCVVKKKSFNCLNADIWL"
        "EYRMEDQKSVPKVFKIMMDD");

    appendValue(qIds, "Query_Numero_Uno with args");
    appendValue(qIds, "Query_Numero_Dos with args");
    appendValue(qIds, "Query_Numero_Tres with args");

    appendValue(sIds, "Subject_Numero_Uno");
    appendValue(sIds, "Subject_Numero_Dos");

    BlastTabularFileOut<TContext> outfile(argv[1]);
    String<TBlastRecord> records;

    // protein vs protein search is BLASTP
    context(outfile).blastProgram = BlastProgram::BLASTP;

    // set gap parameters in blast notation
    setScoreGapOpenBlast(context(outfile).scoringScheme, -11);
    setScoreGapExtend(context(outfile).scoringScheme, -1);
    SEQAN_ASSERT(isValid(context(outfile).scoringScheme));

    // set the database properties in the context
    context(outfile).dbName = "The Foo Database";
    context(outfile).dbTotalLength = length(concat(subjects));
New Code
    context(outfile).dbNumberOfSeqs = length(subjects);

    writeHeader(outfile); // write file header

    for (unsigned q = 0; q < length(queries); ++q)
    {
        appendValue(records, TBlastRecord(qIds[q]));
        TBlastRecord & r = back(records);

        r.qLength = length(queries[q]);

        for (unsigned s = 0; s < length(subjects); ++s)
        {
            appendValue(r.matches, TBlastMatch(sIds[s]));
            TBlastMatch & m = back(records[q].matches);

            assignSource(m.alignRow0, queries[q]);
            assignSource(m.alignRow1, subjects[s]);
            if (m.eValue > 1)
                eraseBack(records[q].matches);
        }
        writeRecord(outfile, r);
Bottom
    writeFooter(outfile);

    return 0;
}
#else
int main()
{
    std::cerr << "USAGE: FILE_OUT\n";
    return 0;
}
#endif

Assignment 7

Objective

Now that you have the align member computed for every match, also save the begin and end positions, as well as the lengths. Blast Output needs to now about the number of gaps, mismatches… of every match, how can they be computed?

Solution
Top
#include <iostream>
#include <seqan/basic.h>
#ifndef STDLIB_VS
#include <seqan/blast.h>

using namespace seqan2;

int main(int argc, char ** argv)
{
    if (argc != 2)
    {
      std::cerr << "USAGE: FILE_OUT\n";
      return 0;
    }

    typedef String<AminoAcid>               TSequence;
    typedef std::string                     TId;
    typedef Gaps<TSequence, ArrayGaps>      TGaps;
    typedef BlastMatch<TGaps, TGaps>        TBlastMatch;
    typedef BlastRecord<TBlastMatch>        TBlastRecord;
    typedef BlastIOContext<Blosum62>        TContext;

    StringSet<TSequence>    queries;
    StringSet<TSequence>    subjects;
    StringSet<TId>          qIds;
    StringSet<TId>          sIds;

    appendValue(queries, "VAYAQPRKLCYP");
    appendValue(queries, "NAYPRUTEIN");
    appendValue(queries, "AVITSFTQ");

    appendValue(subjects,
        "SSITEEKHIPHKEQDKDAEFLSKEALKTHMTENVLQMDRRAVQDPSTSFLQLLKAKGLLG"
        "LPDYEVNLADVNSPGFRKVAYAQTKPRRLCFPNGGTRRGSFIMDTAVVVMVSLRYVNIGK"
        "VIFPGATDVSEGEDEFWAGLPQAYGCLATEFLCIHIAIYSWIHVQSSRYDDMNASVIRAK"
        "LNLAVITSWTQLIQAEKETI");

    appendValue(subjects,
        "GATRDSKGNAVITSFTQARLRVYADLLGPYWIILHVIELTGVGNTGQKCTLNHMGTYAVF"
        "DLKQPPATNDLGLPKPCFIGFDIQNELAIGTVGHSEAVIAAFTQRDRLEERAESKQSLAR"
        "PVISPKLIAEVSTVLESALNQMYSSLGFYRVERAEDYAQPRKLCVVKKKSFNCLNADIWL"
        "EYRMEDQKSVPKVFKIMMDD");

    appendValue(qIds, "Query_Numero_Uno with args");
    appendValue(qIds, "Query_Numero_Dos with args");
    appendValue(qIds, "Query_Numero_Tres with args");

    appendValue(sIds, "Subject_Numero_Uno");
    appendValue(sIds, "Subject_Numero_Dos");

    BlastTabularFileOut<TContext> outfile(argv[1]);
    String<TBlastRecord> records;

    // protein vs protein search is BLASTP
    context(outfile).blastProgram = BlastProgram::BLASTP;

    // set gap parameters in blast notation
    setScoreGapOpenBlast(context(outfile).scoringScheme, -11);
    setScoreGapExtend(context(outfile).scoringScheme, -1);
    SEQAN_ASSERT(isValid(context(outfile).scoringScheme));

    // set the database properties in the context
    context(outfile).dbName = "The Foo Database";
    context(outfile).dbTotalLength = length(concat(subjects));
    context(outfile).dbNumberOfSeqs = length(subjects);

    writeHeader(outfile); // write file header

    for (unsigned q = 0; q < length(queries); ++q)
    {
        appendValue(records, TBlastRecord(qIds[q]));
        TBlastRecord & r = back(records);

        r.qLength = length(queries[q]);

        for (unsigned s = 0; s < length(subjects); ++s)
        {
            appendValue(r.matches, TBlastMatch(sIds[s]));
            TBlastMatch & m = back(records[q].matches);

            assignSource(m.alignRow0, queries[q]);
            assignSource(m.alignRow1, subjects[s]);
New Code
            localAlignment(m.alignRow0, m.alignRow1, seqanScheme(context(outfile).scoringScheme));

            m.qStart = beginPosition(m.alignRow0);
            m.qEnd   = endPosition(m.alignRow0);
            m.sStart = beginPosition(m.alignRow1);
            m.sEnd   = endPosition(m.alignRow1);

            m.sLength = length(subjects[s]);
Bottom
            if (m.eValue > 1)
                eraseBack(records[q].matches);
        }
        writeRecord(outfile, r);
    writeFooter(outfile);

    return 0;
}
#else
int main()
{
    std::cerr << "USAGE: FILE_OUT\n";
    return 0;
}
#endif

Assignment 8

Objective
Finally add e-value statistics and print the results to a file:
  • compute the bit score and e-value for every match

  • discard matches with an e-value greater than 1

  • for every record, sort the matches by bit-score

  • write each record

  • write the footer before exiting the program

Solution
Top
#include <iostream>
#include <seqan/basic.h>
#ifndef STDLIB_VS
#include <seqan/blast.h>

using namespace seqan2;

int main(int argc, char ** argv)
{
    if (argc != 2)
    {
      std::cerr << "USAGE: FILE_OUT\n";
      return 0;
    }

    typedef String<AminoAcid>               TSequence;
    typedef std::string                     TId;
    typedef Gaps<TSequence, ArrayGaps>      TGaps;
    typedef BlastMatch<TGaps, TGaps>        TBlastMatch;
    typedef BlastRecord<TBlastMatch>        TBlastRecord;
    typedef BlastIOContext<Blosum62>        TContext;

    StringSet<TSequence>    queries;
    StringSet<TSequence>    subjects;
    StringSet<TId>          qIds;
    StringSet<TId>          sIds;

    appendValue(queries, "VAYAQPRKLCYP");
    appendValue(queries, "NAYPRUTEIN");
    appendValue(queries, "AVITSFTQ");

    appendValue(subjects,
        "SSITEEKHIPHKEQDKDAEFLSKEALKTHMTENVLQMDRRAVQDPSTSFLQLLKAKGLLG"
        "LPDYEVNLADVNSPGFRKVAYAQTKPRRLCFPNGGTRRGSFIMDTAVVVMVSLRYVNIGK"
        "VIFPGATDVSEGEDEFWAGLPQAYGCLATEFLCIHIAIYSWIHVQSSRYDDMNASVIRAK"
        "LNLAVITSWTQLIQAEKETI");

    appendValue(subjects,
        "GATRDSKGNAVITSFTQARLRVYADLLGPYWIILHVIELTGVGNTGQKCTLNHMGTYAVF"
        "DLKQPPATNDLGLPKPCFIGFDIQNELAIGTVGHSEAVIAAFTQRDRLEERAESKQSLAR"
        "PVISPKLIAEVSTVLESALNQMYSSLGFYRVERAEDYAQPRKLCVVKKKSFNCLNADIWL"
        "EYRMEDQKSVPKVFKIMMDD");

    appendValue(qIds, "Query_Numero_Uno with args");
    appendValue(qIds, "Query_Numero_Dos with args");
    appendValue(qIds, "Query_Numero_Tres with args");

    appendValue(sIds, "Subject_Numero_Uno");
    appendValue(sIds, "Subject_Numero_Dos");

    BlastTabularFileOut<TContext> outfile(argv[1]);
    String<TBlastRecord> records;

    // protein vs protein search is BLASTP
    context(outfile).blastProgram = BlastProgram::BLASTP;

    // set gap parameters in blast notation
    setScoreGapOpenBlast(context(outfile).scoringScheme, -11);
    setScoreGapExtend(context(outfile).scoringScheme, -1);
    SEQAN_ASSERT(isValid(context(outfile).scoringScheme));

    // set the database properties in the context
    context(outfile).dbName = "The Foo Database";
    context(outfile).dbTotalLength = length(concat(subjects));
    context(outfile).dbNumberOfSeqs = length(subjects);

    writeHeader(outfile); // write file header

    for (unsigned q = 0; q < length(queries); ++q)
    {
        appendValue(records, TBlastRecord(qIds[q]));
        TBlastRecord & r = back(records);

        r.qLength = length(queries[q]);

        for (unsigned s = 0; s < length(subjects); ++s)
        {
            appendValue(r.matches, TBlastMatch(sIds[s]));
            TBlastMatch & m = back(records[q].matches);

            assignSource(m.alignRow0, queries[q]);
            assignSource(m.alignRow1, subjects[s]);

            localAlignment(m.alignRow0, m.alignRow1, seqanScheme(context(outfile).scoringScheme));

            m.qStart = beginPosition(m.alignRow0);
            m.qEnd   = endPosition(m.alignRow0);
            m.sStart = beginPosition(m.alignRow1);
            m.sEnd   = endPosition(m.alignRow1);

            m.sLength = length(subjects[s]);
New Code
            computeAlignmentStats(m, context(outfile));
            computeBitScore(m, context(outfile));
            computeEValue(m, r.qLength, context(outfile));
Bottom
            if (m.eValue > 1)
                eraseBack(records[q].matches);
        }

        r.matches.sort(); // sort by bitscore

        writeRecord(outfile, r);
    }

    writeFooter(outfile);

    return 0;
}
#else
int main()
{
    std::cerr << "USAGE: FILE_OUT\n";
    return 0;
}
#endif
Your output file should look like this:
# BLASTP 2.2.26+ [I/O Module of SeqAn-2.0.0, https://www.seqan.de]
# Query: Query_Numero_Uno with args
# Database: The Foo Database
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 2 hits found
Query_Numero_Uno	Subject_Numero_Uno	71.43	14	2	1	1	12	79	92	5e-04	23.1
Query_Numero_Uno	Subject_Numero_Dos	100.00	8	0	0	3	10	157	164	0.001	22.3
# BLASTP 2.2.26+ [I/O Module of SeqAn-2.0.0, https://www.seqan.de]
# Query: Query_Numero_Dos with args
# Database: The Foo Database
# 0 hits found
# BLASTP 2.2.26+ [I/O Module of SeqAn-2.0.0, https://www.seqan.de]
# Query: Query_Numero_Tres with args
# Database: The Foo Database
# Fields: query id, subject id, % identity, alignment length, mismatches, gap opens, q. start, q. end, s. start, s. end, evalue, bit score
# 2 hits found
Query_Numero_Tres	Subject_Numero_Dos	100.00	8	0	0	1	8	10	17	0.007	18.9
Query_Numero_Tres	Subject_Numero_Uno	87.50	8	1	0	1	8	184	191	0.026	16.9
# BLAST processed 3 queries

Assignment 9

Objective

Up until now you have only printed the tabular with comment lines format. What do you have to do to print without comment lines? In legacy format? What about the pairwise format?

Solution
Tabular without comment lines

Add context(outfile).tabularSpec = BlastTabularSpec::NO_COMMENTS at l.53. Remember to use .m8 as file extension!

The result should look like this:

Query_Numero_Uno	Subject_Numero_Uno	71.43	14	2	1	1	12	79	92	5e-04	23.1
Query_Numero_Uno	Subject_Numero_Dos	100.00	8	0	0	3	10	157	164	0.001	22.3
Query_Numero_Tres	Subject_Numero_Dos	100.00	8	0	0	1	8	10	17	0.007	18.9
Query_Numero_Tres	Subject_Numero_Uno	87.50	8	1	0	1	8	184	191	0.026	16.9
Tabular without comment lines (legacy)

To print in legacy tabular format (with or without comment lines), add context(outfile).legacyFormat = true at l.53.

The result should look like this(legacy and NO_COMMENTS):

Query_Numero_Uno	Subject_Numero_Uno	71.43	14	4	1	1	12	79	92	5e-04	23.1
Query_Numero_Uno	Subject_Numero_Dos	100.00	8	0	0	3	10	157	164	0.001	22.3
Query_Numero_Tres	Subject_Numero_Dos	100.00	8	0	0	1	8	10	17	0.007	18.9
Query_Numero_Tres	Subject_Numero_Uno	87.50	8	1	0	1	8	184	191	0.026	16.9
Pairwise format

To print in the pairwise format replace l.52 with BlastReportFileOut<TContext> outfile(argv[1]);.

Remember to use .m0 as file extension!

The result should look like this:

BLASTP 2.2.26+ [I/O Module of SeqAn-2.0.0, https://www.seqan.de]


Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
"Gapped BLAST and PSI-BLAST: a new generation of protein database search
programs",  Nucleic Acids Res. 25:3389-3402.



Reference for SeqAn: Doering, A., D. Weese, T. Rausch, K. Reinert (2008): SeqAn --
An efficient, generic C++ library for sequence analysis. BMC Bioinformatics,
9(1), 11. BioMed Central Ltd. doi:10.1186/1471-2105-9-11



Database: The Foo Database
           2 sequences; 400 total letters


Query= Query_Numero_Uno with args

Length=12
                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

Subject_Numero_Uno                                                   23  0.0005
Subject_Numero_Dos                                                   22  0.0009

ALIGNMENTS
> Subject_Numero_Uno
Length=200

 Score =  23.1 bits (48), Expect =  0.0005
 Identities = 10/14 (71%), Positives = 12/14 (86%), Gaps = 2/14 (14%)

Query  1   VAYAQ--PRKLCYP  12
           VAYAQ  PR+LC+P
Sbjct  79  VAYAQTKPRRLCFP  92


> Subject_Numero_Dos
Length=200

 Score =  22.3 bits (46), Expect =  0.0009
 Identities = 8/8 (100%), Positives = 8/8 (100%), Gaps = 0/8 (0%)

Query  3    YAQPRKLC  10
            YAQPRKLC
Sbjct  157  YAQPRKLC  164



Lambda     K      H
   0.267   0.0410   0.1400

Gapped
Lambda     K      H
   0.267   0.0410   0.1400

Effective search space used: 4800


Query= Query_Numero_Dos with args

Length=10


***** No hits found *****



Lambda     K      H
   0.267   0.0410   0.1400

Gapped
Lambda     K      H
   0.267   0.0410   0.1400

Effective search space used: 4000


Query= Query_Numero_Tres with args

Length=8
                                                                   Score     E
Sequences producing significant alignments:                       (Bits)  Value

Subject_Numero_Dos                                                   18  0.007
Subject_Numero_Uno                                                   16  0.03

ALIGNMENTS
> Subject_Numero_Dos
Length=200

 Score =  18.9 bits (37), Expect =  0.007
 Identities = 8/8 (100%), Positives = 8/8 (100%), Gaps = 0/8 (0%)

Query  1   AVITSFTQ  8
           AVITSFTQ
Sbjct  10  AVITSFTQ  17


> Subject_Numero_Uno
Length=200

 Score =  16.9 bits (32), Expect =  0.03
 Identities = 7/8 (88%), Positives = 8/8 (100%), Gaps = 0/8 (0%)

Query  1    AVITSFTQ  8
            AVITS+TQ
Sbjct  184  AVITSWTQ  191



Lambda     K      H
   0.267   0.0410   0.1400

Gapped
Lambda     K      H
   0.267   0.0410   0.1400

Effective search space used: 3200


  Database: The Foo Database
  Number of letters in database: 400
  Number of sequences in database:  2



Matrix:BLOSUM62
Gap Penalties: Existence: 11, Extension: 1