Iterators

Learning Objective
You will learn how to use iterators to traverse containers in SeqAn. After this tutorial, you will be ready to continue with the tutorials about iterating on more complex structures, e.g. Index Iterators.
Difficulty
Basic
Duration
30 min
Prerequsites
Sequences

Iterators are objects that can be used to browse through the values of containers such as Strings or Segments. SeqAn also offers a range of iterators to traverse efficiently more complex data structures, e.g. Graphs, whose specific usage will be explained in the corresponding tutorials. This tutorial will introduce you into the basic concept of iterators using String iterators as illustration.

Defining Iterators

This section will show you how to define different kinds of iterators.

The metafunction Iterator can be used to determine the appropriate iterator type for a given a container. Some containers offer several kinds of iterators, which can be selected by an optional argument of Iterator. For example, the tag Standard can be used to get an iterator type that resembles the C++ standard random access iterator. The more elaborated Rooted iterator, i.e., an iterator that knows its container, can be selected by specifying the Rooted tag. The construction of an iterator in SeqAn, e.g. for a Dna String, could look like the following:

    Iterator<DnaString>::Type           it1;  // A standard iterator
    Iterator<DnaString, Standard>::Type it2;  // Same as above
    Iterator<DnaString, Rooted>::Type   it3;  // A rooted iterator

Tip

The default iterator implementation is Standard. Rooted iterators offer some convenience for the user. They offer additional functions like container for determining the container on which the iterator works, and they simplify the interface for other functions like atEnd. Moreover, rooted iterators may change the container’s length or capacity, which makes it possible to implement a more intuitive variant of a remove algorithm.

While rooted iterators can usually be converted into standard iterators, it is not always possible to convert standard iterators back into rooted iterators, since standard iterators may lack the information about the container they work on. Therefore, many functions that return iterators like begin or end return rooted iterators instead of standard iterators; this way, they can be used to set both rooted and standard iterator variables. Alternatively it is possible to specify the returned iterator type explicitly by passing the iterator kind as a tag argument, e.g. begin(str, Standard()).

Traversing Containers

In this section you will learn how to iterate over a container using the basic functionality of iterators.

An iterator always points to one value of the container. The function value, which is equivalent to the operator*, can be used to access this value by reference. In contrast getValue return a copy of the value. Functions like goNext or goPrevious, which are equivalent to operator++ and operator-- respectively, can be used to move the iterator to other values within the container.

The functions begin and end, applied to a container, return iterators to the begin and to the end of the container. Note that similar to C++ standard library iterators, the iterator returned by end does not point to the last value of the container but to the position behind the last one. If the container is empty then end() == begin().

The following code prints out a sequence and demonstrates how to iterate over a string.

    DnaString genome = "ACGTACGTACGT";
    typedef Iterator<DnaString>::Type TIterator;
    for (TIterator it = begin(genome); it != end(genome); goNext(it))
    {
        std::cout << value(it);
    }
ACGTACGTACGT

A Working Example

Let us now clarify the usage of iterators with a working example. The following program demonstrates the usage of iterators.

#include <iostream>
#include <seqan/sequence.h>
#include <seqan/stream.h>

using namespace seqan;

int main()
{

The metafunction Iterator returns the iterator type for a given container type. In this case the default implementation Standard is used.

    Dna5String genome = "TATANNNGCGCG";
    Iterator<Dna5String>::Type it = begin(genome);
    Iterator<Dna5String>::Type itEnd = end(genome);

We can use iterators to iterate over the elements of a container, e.g. to print the elements.

    while (it != itEnd)
    {
        std::cout << *it;
        ++it;
    }
    std::cout << std::endl;

Instead of comparing the two iterators it and itEnd, we could also use the function atEnd to check whether we reached the end of the container.

    for (goBegin(it, genome); !atEnd(it, genome); goNext(it))
    {
        std::cout << *it;
    }
    std::cout << std::endl;

Next we will use Rooted Iterators. Since Rooted Iterators know their container, the functions goBegin and atEnd do not need to get the container as an argument. The following example prints for each element of the Dna5 String genome its complement:

    Iterator<Dna5String, Rooted>::Type it2 = begin(genome);
    for (goBegin(it2); !atEnd(it2); goNext(it2))
    {
        if (getValue(it2) == 'A')
            std::cout << 'T';
        else if (getValue(it2) == 'T')
            std::cout << 'A';
        else if (getValue(it2) == 'G')
            std::cout << 'C';
        else if (getValue(it2) == 'C')
            std::cout << 'G';
        else
            std::cout << 'N';
    }
    std::cout << std::endl;

Some iterators support iteration in reverse order with goPrevious as you can see in the next example. Note that goPrevious is called before the value of it2 is accessed. Remember that the end position of a container is always the position behind the last item in the container.

    goEnd(it2);
    while (!atBegin(it2))
    {
        goPrevious(it2);
        std::cout << getValue(it2);
    }
    std::cout << std::endl;

assignValue can be used to change the value of an iterator.

    assignValue(begin(genome), 'N');
    std::cout << genome << std::endl;

    return 0;
}

The output of the program is as follows.

TATANNNGCGCG
TATANNNGCGCG
ATATNNNCGCGC
GCGCGNNNATAT
NATANNNGCGCG

Assignment 1

Type
Review
Objective

Copy the code below, which replaces all N’s of a given String with A’s. Adjust the code to use iterators to traverse the container. Use the Standard iterator.

#include <iostream>
#include <seqan/sequence.h>
#include <seqan/stream.h>

using namespace seqan;

int main()
{
    Dna5String genome = "ANTGGTTNCAACNGTAANTGCTGANNNACATGTNCGCGTGTA";
    for (unsigned i = 0; i < length(genome); ++i)
    {
        if (genome[i] == 'N')
            genome[i] = 'A';
    }
    std::cout << "Modified genome: " << genome << std::endl;
    return 0;
}

Solution

Click more... to see the solution.

#include <iostream>
#include <seqan/sequence.h>
#include <seqan/stream.h>

using namespace seqan;

int main()
{
    Dna5String genome = "ANTGGTTNCAACNGTAANTGCTGANNNACATGTNCGCGTGTA";

    Iterator<Dna5String>::Type it = begin(genome);
    Iterator<Dna5String>::Type itEnd = end(genome);

    for (; it != itEnd; goNext(it))
    {
        if (getValue(it) == 'N')
            value(it) = 'A';
    }
    std::cout << "Modified genome: " << genome << std::endl;
    return 0;
}

Assignment 2

Type
Application
Objective
Use the code from above and change the Standard to a Rooted iterator. Try to shorten the code wherever possible.
Solution

Click more... to see the solution.

#include <iostream>
#include <seqan/sequence.h>
#include <seqan/stream.h>

using namespace seqan;

int main()
{
    Dna5String genome = "ANTGGTTNCAACNGTAANTGCTGANNNACATGTNCGCGTGTA";

    Iterator<Dna5String, Rooted>::Type it = begin(genome);

    for (; !atEnd(it); goNext(it))
    {
        if (getValue(it) == 'N')
            value(it) = 'A';
    }
    std::cout << "Modified genome: " << genome << std::endl;
    return 0;
}

Workshop Assignment 3

Type
Review
Objective

In this assignment, we pick up the example from the workshop assignments from the sequences tutorial. Take the last solution and change the code to use Iterators. First, use Standard Iterators to do this.

#include <iostream>
#include <seqan/sequence.h>
#include <seqan/stream.h>

using namespace seqan;
// Function to print simple alignment between two sequences with the same length
template <typename TText1, typename TText2>
void printAlign(TText1 const & genomeFragment, TText2 const & read)
{
    std::cout <<  "Alignment " << std::endl;
    std::cout << "  genome : " << genomeFragment << std::endl;
    std::cout << "  read   : " << read << std::endl;
}

int main(int, char const **)
{
    // Build reads and genomes
    DnaString chr1 = "TATAATATTGCTATCGCGATATCGCTAGCTAGCTACGGATTATGCGC"
                     "TCTGCGATATATCGCGCTAGATGTGCAGCTCGATCGAATGCACGTGT"
                     "GTGCGATCGATTAGCGTCGATCATCGATCTATATTAGCGCGCGGTAT"
                     "CGGACGATCATATTAGCGGTCTAGCATTTAG";

    // Build List containing all reads
    typedef String<DnaString> DnaList;
    DnaList readList;
    resize(readList, 4);
    readList[0] = "TTGCTATCGCGATATCGCTAGCTAGCTACGGATTATGCGCTCTGCGATATATCGCGCT";
    readList[1] = "TCGATTAGCGTCGATCATCGATCTATATTAGCGCGCGGTATCGGACGATCATATTAGCGGTCTAGCATT";
    readList[2] = "AGCCTGCGTACGTTGCAGTGCGTGCGTAGACTGTTGCAAGCCGGGGGTTCATGTGCGCTGAAGCACACATGCACA";
    readList[3] = "CGTGCACTGCTGACGTCGTGGTTGTCACATCGTCGTGCGTGCGTACTGCTGCTGACA";

    // Append a second chromosome sequence fragment to chr1
    DnaString chr2 = "AGCCTGCGTACGTTGCAGTGCGTGCGTAGACTGTTGCAAGCCGGGGGT"
                     "TCATGTGCGCTGAAGCACACATGCACACGTCTCTGTGTTCCGACGTGT"
                     "GTCACGTGCACTGCTGACGTCGTGGTTGTCACATCGTCGTGCGTGCGT"
                     "ACTGCTGCTGACACATGCTGCTG";
    append(chr1, chr2);

    // Print readlist
    std::cout << " \n Read list: " << std::endl;
    for(unsigned i = 0; i < length(readList); ++i)
        std::cout << readList[i] << std::endl;

    // Assume we have mapped the 4 reads to chr1 (and chr2) and now have the mapping start positions (no gaps).
    // Store the start position in a String: 7, 100, 172, 272
    String<unsigned> alignPosList;
    resize(alignPosList, 4);
    alignPosList[0] = 7;
    alignPosList[1] = 100;
    alignPosList[2] = 172;
    alignPosList[3] = 272;

    // Print alignments using Segment
    std::cout << " \n Print alignment using Segment: " << std::endl;
    for(unsigned i = 0; i < length(readList); ++i)
    {
        // Begin and end position of a given alignment between the read and the genome
        unsigned beginPosition = alignPosList[i];
        unsigned endPosition = beginPosition + length(readList[i]);
        // Build infix
        Infix<DnaString>::Type genomeFragment = infix(chr1, beginPosition, endPosition);
        // Call of our function to print the simple alignment
        printAlign(genomeFragment, readList[i]);
    }

    // Iterators :)
    // Print alignments using Iterators: Do the same as above, but use Iterators to iterate over the read list.
    // First, use Standard Iterators: Build two iterators it and itEnd to traverse readList.

    std::cout << " \n Print alignment using Standard Iterators: " << std::endl;

    return 0;
}
Solution

Click more... to see the solution

#include <iostream>
#include <seqan/sequence.h>
#include <seqan/stream.h>

using namespace seqan;
// Function to print simple alignment between two sequences with the same length
template <typename TText1, typename TText2>
void printAlign(TText1 const & genomeFragment, TText2 const & read)
{
    std::cout <<  "Alignment " << std::endl;
    std::cout << "  genome : " << genomeFragment << std::endl;
    std::cout << "  read   : " << read << std::endl;
}

int main(int, char const **)
{
    // Build reads and genomes
    DnaString chr1 = "TATAATATTGCTATCGCGATATCGCTAGCTAGCTACGGATTATGCGCTCTGCGATATATCGCGCTAGATGTGCAGCTCGATCGAATGCACGTGTGTGCGATCGATTAGCGTCGATCATCGATCTATATTAGCGCGCGGTATCGGACGATCATATTAGCGGTCTAGCATTTAG";

    // Build List containing all reads
    typedef String<DnaString> TDnaList;
    TDnaList readList;
    resize(readList, 4);
    readList[0] = "TTGCTATCGCGATATCGCTAGCTAGCTACGGATTATGCGCTCTGCGATATATCGCGCT";
    readList[1] = "TCGATTAGCGTCGATCATCGATCTATATTAGCGCGCGGTATCGGACGATCATATTAGCGGTCTAGCATT";
    readList[2] = "AGCCTGCGTACGTTGCAGTGCGTGCGTAGACTGTTGCAAGCCGGGGGTTCATGTGCGCTGAAGCACACATGCACA";
    readList[3] = "CGTGCACTGCTGACGTCGTGGTTGTCACATCGTCGTGCGTGCGTACTGCTGCTGACA";

    // Append a second chromosome sequence fragment to chr1
    DnaString chr2 = "AGCCTGCGTACGTTGCAGTGCGTGCGTAGACTGTTGCAAGCCGGGGGTTCATGTGCGCTGAAGCACACATGCACACGTCTCTGTGTTCCGACGTGTGTCACGTGCACTGCTGACGTCGTGGTTGTCACATCGTCGTGCGTGCGTACTGCTGCTGACACATGCTGCTG";
    append(chr1, chr2);

    // Print readlist
    std::cout << " \n Read list: " << std::endl;
    for (unsigned i = 0; i < length(readList); ++i)
        std::cout << readList[i] << std::endl;

    // Assume we have mapped the 4 reads to chr1 (and chr2) and now have the mapping start positions (no gaps).
    // Store the start position in a String alignPosList: 7, 100, 172, 272
    String<unsigned> alignPosList;
    resize(alignPosList, 4);
    alignPosList[0] = 7;
    alignPosList[1] = 100;
    alignPosList[2] = 172;
    alignPosList[3] = 272;

    // Print alignments using Segment
    std::cout << " \n Print alignment using Segment: " << std::endl;
    for (unsigned i = 0; i < length(readList); ++i)
    {
        // Temporary copy of begin and end position (beginPosition) from alignPosList
        // of a given alignment between the read and the genome
        unsigned beginPosition = alignPosList[i];
        unsigned endPosition = beginPosition + length(readList[i]);
        // Build infix
        Infix<DnaString>::Type genomeFragment = infix(chr1, beginPosition, endPosition);
        // Call of our function to print the simple alignment
        printAlign(genomeFragment, readList[i]);
    }
    // Iterators :)
    // Print alignments using Iterators: Do the same as above, but use Iterators to iterate over the read list.
    // First, use Standard Iterators.
    Iterator<TDnaList>::Type it = begin(readList);
    Iterator<TDnaList, Standard>::Type itEnd = end(readList); //same Iterator as above

    std::cout << " \n Print alignment using Standard Iterators: " << std::endl;
    for (; it != itEnd; goNext(it))
    {
        // Get the right index for alignPosList
        int i = position(it, readList);
        // Temporary copy of begin and end position (beginPosition) from alignPosList
        // of a given alignment between the read and the genome
        unsigned beginPosition = alignPosList[i];
        unsigned endPosition = beginPosition + length(value(it));
        // Build Infix
        Infix<DnaString>::Type genomeFragment = infix(chr1, beginPosition, endPosition);
        // Call of our function to print the simple alignment
        printAlign(genomeFragment, value(it));
    }

    return 0;
}

Workshop Assignment 4

Type
Review
Objective
Now, use rooted iterators in the example from Workshop Assignment 3.
Solution

Click more... to see the solution.

#include <iostream>
#include <seqan/sequence.h>
#include <seqan/stream.h>

using namespace seqan;
// Function to print simple alignment between two sequences with the same length
template <typename TText1, typename TText2>
void printAlign(TText1 const & genomeFragment, TText2 const & read)
{
    std::cout <<  "Alignment " << std::endl;
    std::cout << "  genome : " << genomeFragment << std::endl;
    std::cout << "  read   : " << read << std::endl;
}

int main(int, char const **)
{
    // Build reads and genomes
    DnaString chr1 = "TATAATATTGCTATCGCGATATCGCTAGCTAGCTACGGATTATGCGCTCTGCGATATATCGCGCTAGATGTGCAGCTCGATCGAATGCACGTGTGTGCGATCGATTAGCGTCGATCATCGATCTATATTAGCGCGCGGTATCGGACGATCATATTAGCGGTCTAGCATTTAG";
    // Build List containing all reads
    typedef String<DnaString> TDnaList;
    TDnaList readList;
    resize(readList, 4);
    readList[0] = "TTGCTATCGCGATATCGCTAGCTAGCTACGGATTATGCGCTCTGCGATATATCGCGCT";
    readList[1] = "TCGATTAGCGTCGATCATCGATCTATATTAGCGCGCGGTATCGGACGATCATATTAGCGGTCTAGCATT";
    readList[2] = "AGCCTGCGTACGTTGCAGTGCGTGCGTAGACTGTTGCAAGCCGGGGGTTCATGTGCGCTGAAGCACACATGCACA";
    readList[3] = "CGTGCACTGCTGACGTCGTGGTTGTCACATCGTCGTGCGTGCGTACTGCTGCTGACA";
    // Append a second chromosome sequence fragment to chr1
    DnaString chr2 = "AGCCTGCGTACGTTGCAGTGCGTGCGTAGACTGTTGCAAGCCGGGGGTTCATGTGCGCTGAAGCACACATGCACACGTCTCTGTGTTCCGACGTGTGTCACGTGCACTGCTGACGTCGTGGTTGTCACATCGTCGTGCGTGCGTACTGCTGCTGACACATGCTGCTG";
    append(chr1, chr2);
    // Print readlist
    std::cout << " \n Read list: " << std::endl;
    for (unsigned i = 0; i < length(readList); ++i)
        std::cout << readList[i] << std::endl;
    // Assume we have mapped the 4 reads to chr1 (and chr2) and now have the mapping start positions (no gaps).
    // Store the start position in a String alignPosList: 7, 100, 172, 272
    String<unsigned> alignPosList;
    resize(alignPosList, 4);
    alignPosList[0] = 7;
    alignPosList[1] = 100;
    alignPosList[2] = 172;
    alignPosList[3] = 272;
    // Print alignments using Segment
    std::cout << " \n Print alignment using Segment: " << std::endl;
    for (unsigned i = 0; i < length(readList); ++i)
    {
        // Temporary copy of begin and end position (beginPosition) from alignPosList
        // of a given alignment between the read and the genome
        unsigned beginPosition = alignPosList[i];
        unsigned endPosition = beginPosition + length(readList[i]);
        // Build infix
        Infix<DnaString>::Type genomeFragment = infix(chr1, beginPosition, endPosition);
        // Call of our function to print the simple alignment
        printAlign(genomeFragment, readList[i]);
    }
    // Iterators :)
    // Print alignments using Iterators: Do the same as above, but use Iterators to iterate over the read list.
    // First, use Standard Iterators.
    Iterator<TDnaList>::Type it = begin(readList);
    Iterator<TDnaList, Standard>::Type itEnd = end(readList); //same Iterator as above

    std::cout << " \n Print alignment using Standard Iterators: " << std::endl;
    for (; it != itEnd; goNext(it))
    {
        // Get the right index for alignPosList
        int i = position(it, readList);
        // Temporary copy of begin and end position (beginPosition) from alignPosList
        // of a given alignment between the read and the genome
        unsigned beginPosition = alignPosList[i];
        unsigned endPosition = beginPosition + length(value(it));
        // Build Infix
        Infix<DnaString>::Type genomeFragment = infix(chr1, beginPosition, endPosition);
        // Call of our function to print the simple alignment
        printAlign(genomeFragment, value(it));
    }
    // Now, use Rooted Iterators.
    Iterator<TDnaList, Rooted>::Type it2 = begin(readList);
    std::cout << " \n Print alignment using Rooted Iterators: " << std::endl;
    for (; !atEnd(it2); goNext(it2))
    {
        int i = position(it2);
        // Temporary copy of begin and end position (beginPosition) from alignPosList
        // of a given alignment between the read and the genome
        unsigned beginPosition = alignPosList[i];
        unsigned endPosition = beginPosition + length(value(it2));
        // Build Infix
        Infix<DnaString>::Type genomeFragment = infix(chr1, beginPosition, endPosition);
        // Call of our function to print the simple alignment
        printAlign(genomeFragment, value(it2));
    }
    return 0;
}