Modifiers

Learning Objective
In this tutorial you will learn how to modify the elements of a container without copying them using SeqAn modifiers. You will learn about the different specializations and how to work with them.
Difficulty
Basic
Duration
20 min
Prerequisites
A First Example, Sequences

Overview

Modifiers give a different view to other classes. They can be used to change the elements of a container without touching them. For example, someone gave you an algorithm that works on two arbitrary String objects, but you want to use it for the special pair of a string and its reverse (left-to-right mirror). The classical approach would be to make a copy of the one string, where all elements are mirrored from left to right and call the algorithm with both strings. With modifiers, e.g. a ModifiedString, you can create the reverse in \(\mathcal{O}(1)\) extra memory without copying the original string. This can be handy if the original sequence is large.

Modifiers implement a certain concept (e.g. ContainerConcept, Iterator, ...) or class interface (String, ...) and thus can be used as such. The mirror modifier is already part of SeqAn and implements the class interface of String and can be used in every algorithm that works on strings.

The Modified String

The ModidiedString is a modifier that implements the String interface and thus can be used like a String. It has two template parameters. The first one specifies a sequence type (e.g. String, Segment, ...) and the second one specifies the modifiers behavior. That can be ModReverseString for mirroring a string left to right or ModViewModifiedString for applying a function to every single character (like ‘C’->’G’, ‘A’->’T’, ...).

ModReverse

We begin with the specialization ModReverseString from the example above. Now we have a given string:

#include <iostream>
#include <seqan/file.h>
#include <seqan/modifier.h>

using namespace seqan;


int main ()
{
	String<char> myString = "A man, a plan, a canal-Panama";

and want to get the reverse. So we need a ModifiedString specialized with String<char> and ModReverseString. We create the modifier and link it with myString:

	ModifiedString< String<char>, ModReverse > myModifier(myString);

The result is:

	std::cout << myString << std::endl;
	std::cout << myModifier << std::endl;
A man, a plan, a canal-Panama
amanaP-lanac a ,nalp a ,nam A

To verify that we didn’t copy myString, we replace an infix of the original string and see that, as a side effect, the modified string has also changed:

	replace(myString, 9, 9, "master ");
	std::cout << myString << std::endl;
	std::cout << myModifier << std::endl;
	return 0;
}
A man, a master plan, a canal-Panama
amanaP-lanac a ,nalp retsam a ,nam A

ModView

Another specialization of the ModifiedString is the ModViewModifiedString modifier. Assume we need all characters of myString to be in upper case without copying myString. In SeqAn you first create a functor (a STL unary function) which converts a character to its upper-case character.

struct MyFunctor : public std::unary_function<char,char>
{
    inline char operator()(char x) const
    {
        if (('a' <= x) && (x <= 'z')) return (x + ('A' - 'a'));
        return x;
    }
};

and then create a ModifiedString specialized with ModView<MyFunctor>:

ModifiedString< String<char>, ModView<MyFunctor> > myModifier(myString);

The result is:

std::cout << myString << '\n';
std::cout << myModifier << '\n';
A man, a plan, a canal-Panama
A MAN, A PLAN, A CANAL-PANAMA

The upper-case functor and some other predefined functors are part of SeqAn (in seqan/modifier/modifier_functors.h) already. The following functors can be used as an argument of ModViewModifiedString:

FunctorUpcase<TValue>
Converts each character of type TValue to its upper-case character
FunctorLowcase<TValue>
Converts each character to type TValue to its lower-case character
FunctorComplement<Dna>
Converts each nucleotide to its complementary nucleotide
FunctorComplement<Dna5>
The same for the Dna5 alphabet
FunctorConvert<TInValue,TOutValue>
Converts the type of each character from TInValue to TOutValue

So instead of defining your own functor we could have used a predefined one:

ModifiedString< String<char>, ModView<FunctorUpcase<char> > > myModifier(myString);

Assignment 1

Type
Review
Objective

In this assignment you will create a modifier using your own functor. Assume you have given two Dna sequences as strings as given in the code example below. Let’s assume you know that in one of your Dna sequences a few ‘C’ nucleotides are converted into ‘T’ nucleotides, but you still want to compare the sequence. Extend the code example as follows:

  1. Write a functor which converts all ‘C’ nucleotides to ‘T’ nucleotides.
  2. Define a ModifiedString with the specialization ModViewModifiedString using this functor.
  3. Now you can modify both sequences to compare them, treating all ‘Cs’ as ‘Ts’. Print the results.
#include <iostream>
#include <seqan/file.h>
#include <seqan/modifier.h>


using namespace std;
using namespace seqan;


int main()
{
    typedef String<Dna> TSequence;                 

    TSequence seq1 = "CCCGGCATCATCC";
    TSequence seq2 = "CTTGGCATTATTC";

    std::cout << seq1 << std::endl;
    std::cout << seq2 << std::endl;
    std::cout << std::endl;
    

    return 0;
}
Solution
#include <iostream>
#include <seqan/file.h>
#include <seqan/modifier.h>


using namespace std;
using namespace seqan;


struct ConvertCT : public ::std::unary_function<Dna, Dna> 
{
    inline Dna operator()(Dna x) const 
    {
        if (x == 'C') return 'T';
        return x; 
    }
};


int main()
{
    typedef String<Dna> TSequence;                 

    TSequence seq1 = "CCCGGCATCATCC";
    TSequence seq2 = "CTTGGCATTATTC";

    std::cout << seq1 << std::endl;
    std::cout << seq2 << std::endl;
    std::cout << std::endl;
    
    typedef ModifiedString< TSequence, ModView<ConvertCT> > TModCT;
    TModCT modCT1(seq1);
    TModCT modCT2(seq2);

    std::cout << modCT1 << std::endl;
    std::cout << modCT2 << std::endl;

    return 0;
}
CCCGGCATCATCC
CTTGGCATTATTC
TTTGGTATTATTT
TTTGGTATTATTT

For some commonly used modifiers you can use the following shortcuts:

Shortcut Substitution
ModComplementDna ModView<FunctorComplement<Dna> >
ModComplementDna5 ModView<FunctorComplement<Dna5> >
DnaStringComplement ModifiedString<DnaString, ModComplementDna>
Dna5StringComplement ModifiedString<Dna5String, ModComplementDna5>
DnaStringReverse ModifiedString<DnaString, ModReverse>
Dna5StringReverse ModifiedString<Dna5String, ModReverse>
DnaStringReverseComplement ModifiedString<ModifiedString<DnaString, ModComplementDna>, ModReverse>
Dna5StringReverseComplement ModifiedString<ModifiedString<Dna5String, ModComplementDna5>, ModReverse>

The Modified Iterator

We have seen how a ModifiedString can be used to modify strings without touching or copying original data. The same can be done with iterators. The ModifiedIterator implements the Iterator concept and thus can be used in every algorithm or data structure that expects an iterator. In fact, we have already used the ModifiedIterator unknowingly in the examples above, as in our cases the ModifiedString returns a corresponding ModifiedIterator via the Iterator meta-function. The main work is done in the ModifiedIterator, whereas the ModifiedString only overloads the begin and end. Normally, you are going to use the ModifiedString and maybe the result of its Iterator meta-function instead of a ModifiedIterator directly.

Nested Modifiers

As modifiers implement a certain concept and depend on classes of this concept, two modifiers can be chained to create a new modifier. We have seen how the ModifiedString specialized with ModReverseString and ModViewModifiedString can be used. Now we want to combine them to create a modifier for the reverse complement of a DnaString We begin with the original string:

String<Dna> myString = "attacgg";

Then we define the modifier that complements a DnaString:

typedef ModifiedString<String<Dna>, ModComplementDna>   TMyComplement;

This modifier now should be reversed from left to right:

typedef ModifiedString<TMyComplement, ModReverse>       TMyReverseComplement;

The original string can be given to the constructor.

TMyReverseComplement myReverseComplement(myString);

The result is:

std::cout << myString << '\n';
std::cout << myReverseComplement << '\n';

infix(myString, 1, 1) = "cgt";

std::cout << myString << '\n';
std::cout << myReverseComplement << '\n';
ATTACGG
CCGTAAT
ACGTTTACGG
CCGTAAACGT

Using a predefined shortcut, the whole example could be reduced to:

String<Dna> myString = "attacgg";
std::cout << myString << std::endl;
std::cout << DnaStringReverseComplement(myString) << std::endl;
comments powered by Disqus