Alphabets¶
- Learning Objective
- You will learn the details about the alphabets in SeqAn.
- Difficulty
- Basic
- Duration
- 15 min
- Prerequisites
- A First Example
This tutorial will describe the different alphabets used in SeqAn, or in other words, you will learn about the contained types of a SeqAn String. To continue with the other tutorials, it would be enough to know, that in SeqAn several standard alphabets are already predefined, e.g. Dna, Dna5, Rna, Rna5, Iupac, AminoAcid.
Types¶
Any type that provides a default constructor, a copy constructor and an assignment operator can be used as the alphabet / contained type of a String (see also the tutorial Sequences).
This includes the C++ POD types, e.g. char
, int
, double
etc.
In addition you can use more complex types like String as the contained type of strings, e.g. String<String<char> >
.
SeqAn also provides the following types that are useful in bioinformatics. Each of them is a specialization of the class SimpleType.
Specialization | Description |
---|---|
AminoAcid | Amino Acid Alphabet |
Dna | DNA alphabet |
Dna5 | N alphabet including N character |
DnaQ | N alphabet plus phred quality |
Dna5Q | N alphabet plus phred quality including N character |
Finite | Finite alphabet of fixed size. |
Iupac | N Iupac code. |
Rna | N alphabet |
Rna5 | N alphabet including N character |
Functionality¶
In SeqAn, alphabets are value types that can take a limited number of values and which hence can be mapped to a range of natural numbers. We can retrieve the number of different values of an alphabet, the alphabet size, by the metafunction ValueSize.
typedef Dna TAlphabet;
unsigned alphSize = ValueSize<TAlphabet>::VALUE;
std::cout << "Alphabet size of Dna: " << alphSize << '\n';
Alphabet size of Dna: 4
Another useful metafunction called BitsPerValue can be used to determine the number of bits needed to store a value of a given alphabet.
unsigned bits = BitsPerValue<TAlphabet>::VALUE;
std::cout << "Number of bits needed to store a value of type Dna: " << bits << '\n';
Number of bits needed to store a value of type Dna: 2
The order of a character in the alphabet (i.e. its corresponding natural number) can be retrieved by calling the function ordValue. See each specialization’s documentation for the ordering of the alphabet’s values.
Dna a = 'A';
Dna c = 'C';
Dna g = 'G';
Dna t = 'T';
std::cout <<"A: " << (unsigned)ordValue(a) << '\n';
std::cout <<"C: " << (unsigned)ordValue(c) << '\n';
std::cout <<"G: " << (unsigned)ordValue(g) << '\n';
std::cout <<"T: " << (unsigned)ordValue(t) << '\n';
A: 0
C: 1
G: 2
T: 3
Tip
The return value of the ordValue function is determined by the metafunction ValueSize.
ValueSize returns the type which uses the least amount of memory while being able to represent all possible values.
E.g. ValueSize of Dna returns an _uint8
which is able to represent 256 different characters.
However, note that std::cout
has no visible symbol for printing all values on the screen, hence a cast to unsigned
might be necessary.
Assignment 1¶
- Type
- Application
- Objective
In this task you will learn how to access all the letters of an alphabet. Use the piece of code from below and adjust the function
showAllLettersOfMyAlphabet()
to go through all the characters of the current alphabet and print them.#include <seqan/sequence.h> #include <seqan/basic.h> #include <iostream> using namespace seqan; // We want to define a function, which takes // the alphabet type as an argument template <typename TAlphabet> void showAllLettersOfMyAlphabet(TAlphabet const &) { // ... } int main() { showAllLettersOfMyAlphabet(AminoAcid()); showAllLettersOfMyAlphabet(Dna()); showAllLettersOfMyAlphabet(Dna5()); return 0; }
- Hints
- You will need the Metafunction ValueSize.
- Solution
Click more... to see the solution.
#include <seqan/sequence.h> #include <seqan/basic.h> #include <iostream> using namespace seqan; // We define a function which takes // the alphabet type as an argument template <typename TAlphabet> void showAllLettersOfMyAlphabet(TAlphabet const &) { typedef typename ValueSize<TAlphabet>::Type TSize; // We need to determine the alphabet size // using the metafunction ValueSize TSize alphSize = ValueSize<TAlphabet>::VALUE; // We iterate over all characters of the alphabet // and output them for (TSize i = 0; i < alphSize; ++i) std::cout << static_cast<unsigned>(i) << ',' << TAlphabet(i) << " "; std::cout << std::endl; } int main() { showAllLettersOfMyAlphabet(AminoAcid()); showAllLettersOfMyAlphabet(Dna()); showAllLettersOfMyAlphabet(Dna5()); return 0; }