Consensus Alignment

Learning Objective
You will learn how to perform a consensus alignment of sequences (e.g. NGS reads) stored in a FragmentStore. After completing this tutorial, you will be able to perform a consensus alignment of reads with and without using alignment information. This is useful for the consensus step in sequence assembly.
Difficulty
Advanced
Duration
1 h
Prerequisites
Fragment Store

The SeqAn module <seqan/consensus.h> allows the computation of consensus alignments based on the method by Rausch et al. [RKD+09]. It can be used for the consensus step in Overlap-Layout-Consensus assemblers.

Consensus with Approximate Positions

The consensus module has two modes. The first one is applicable when approximate positions of the reads are known. The following program demonstrates this functionality.

First, we include the necessary headers.

#include <iostream>

#include <seqan/store.h>
#include <seqan/consensus.h>

using namespace seqan;

int main()
{

Next, the fragment store is filled with reads and approximate positions. The true alignment is shown in the comments.

    FragmentStore<> store;
    // Resize contigStore and contigNameStore (required for printing the first layout).
    resize(store.contigStore, 1);
    appendValue(store.contigNameStore, "ref");

    // Actual read layout.
    //
    // AATGGATGGCAAAATAGTTGTTCCATGAATACATCTCTAAAGAGCTTT
    //           AAAATAGTTGTTCCATGAATACATCTCTAAAGAGCTTTGATGCTAATTT
    //                AGTTGTTCCATGAATACATCTCTAAAGAGCTTTGATGCTAATTTAGTCAAATTTTCAATACTGTA
    //                               ACATCTCTAAAGAGCTTTGATGCTAATTTAGTCAAATT
    //                                         AGAGCTTTGATGCTAATTTAGTCAAATTTTCAATACTGTACAATCTTCTCTAG

    // Append reads (includes small errors).
    appendRead(store, "AATGGATGGCAAAATAGTTGTTCCATGAATACATCTCTAAAGAGCTTT");
    appendRead(store, "AAAGTAGTTGTTCCATGAATACATCTCTAAAGAGCTTTGATGCTAATTT");
    appendRead(store, "AGTTGTCCATGAATACATCTCTAAAGAGCTTTGATGCTAATTTAGTCAATTTTCAATACTGTA");
    appendRead(store, "ACATCTCTTAAAGAGCTTTGATGCTAATTTAGTCAAATT");
    appendRead(store, "AGAGCTTTGATGCTAATTTAGTCAAATTTTCAATACTGTACAATCTTCTCTAG");

    // The position used in the following are only approximate and would
    // not lead to the read layout above.
    appendAlignedRead(store, 0, 0, 0, (int)length(store.readSeqStore[0]));
    appendAlignedRead(store, 1, 0, 12, 12 + (int)length(store.readSeqStore[1]));
    appendAlignedRead(store, 2, 0, 14, 14 + (int)length(store.readSeqStore[2]));
    appendAlignedRead(store, 3, 0, 18, 18 + (int)length(store.readSeqStore[3]));
    appendAlignedRead(store, 4, 0, 25, 25 + (int)length(store.readSeqStore[4]));

    // Print the (wrong) alignment.
    std::cout << "Initial alignment\n\n";
    AlignedReadLayout layout;
    layoutAlignment(layout, store);
    printAlignment(std::cout, layout, store, /*contigID=*/ 0, /*beginPos=*/ 0, /*endPos=*/ 80, 0, 30);

This is followed by computing the consensus alignment using the function consensusAlignment.

    ConsensusAlignmentOptions options;
    options.useContigID = true;
    consensusAlignment(store, options);

Finally, the alignment is printed using an AlignedReadLayout object.

    std::cout << "Final alignment\n\n";
    layoutAlignment(layout, store);
    printAlignment(std::cout, layout, store, /*contigID=*/ 0, /*beginPos=*/ 0, /*endPos=*/ 80, 0, 30);

    return 0;
}

Here is the program’s output:

Initial alignment

--------------------------------------------------------------------------------
AATGGATGGCAAAATAGTTGTTCCATGAATACATCTCTAAAGAGCTTT
            AAAGTAGTTGTTCCATGAATACATCTCTAAAGAGCTTTGATGCTAATTT
              AGTTGTCCATGAATACATCTCTAAAGAGCTTTGATGCTAATTTAGTCAATTTTCAATACTGTA
                  ACATCTCTTAAAGAGCTTTGATGCTAATTTAGTCAAATT
                         AGAGCTTTGATGCTAATTTAGTCAAATTTTCAATACTGTACAATCTTCTCTAG
Final alignment

AATGGATGGCAAAATAGTTGTTCCATGAATACATCTC-TAAAGAGCTTTGATGCTAATTTAGTCAAATTTTCAATACTGT
.....................................*...........
          ...G.......................*......................
               .....*................*..........................*...............
                              .......T...............................
                                         .......................................

Consensus without Approximate Positions

When setting the useContigID member of the ConsensusAlignmentOptions object to false then we can also omit adding approximate positions for the reads. In this case, the consensus step performs an all-to-all alignment of all reads and then computes a consensus multi-read alignment for all of them. This is demonstrated by the following program.

#include <iostream>

#include <seqan/store.h>
#include <seqan/consensus.h>

using namespace seqan;

int main()
{
    FragmentStore<> store;

    // Actual read layout.
    //
    // AATGGATGGCAAAATAGTTGTTCCATGAATACATCTCTAAAGAGCTTT
    //           AAAATAGTTGTTCCATGAATACATCTCTAAAGAGCTTTGATGCTAATTT
    //                AGTTGTTCCATGAATACATCTCTAAAGAGCTTTGATGCTAATTTAGTCAAATTTTCAATACTGTA
    //                               ACATCTCTAAAGAGCTTTGATGCTAATTTAGTCAAATT
    //                                         AGAGCTTTGATGCTAATTTAGTCAAATTTTCAATACTGTACAATCTTCTCTAG

    // Append reads (includes small errors).
    appendRead(store, "AATGGATGGCAAAATAGTTGTTCCATGAATACATCTCTAAAGAGCTTT");
    appendRead(store, "AAAGTAGTTGTTCCATGAATACATCTCTAAAGAGCTTTGATGCTAATTT");
    appendRead(store, "AGTTGTCCATGAATACATCTCTAAAGAGCTTTGATGCTAATTTAGTCAATTTTCAATACTGTA");
    appendRead(store, "ACATCTCTTAAAGAGCTTTGATGCTAATTTAGTCAAATT");
    appendRead(store, "AGAGCTTTGATGCTAATTTAGTCAAATTTTCAATACTGTACAATCTTCTCTAG");

    ConsensusAlignmentOptions options;
    options.useContigID = false;
    consensusAlignment(store, options);

    std::cout << "Final alignment\n\n";
    AlignedReadLayout layout;
    layoutAlignment(layout, store);
    printAlignment(std::cout, layout, store, /*contigID=*/ 0, /*beginPos=*/ 0, /*endPos=*/ 80, 0, 30);

    return 0;
}

Here is this modified programs’ output:

Final alignment

AATGGATGGCAAAATAGTTGTTCCATGAATACATCTC-TAAAGAGCTTTGATGCTAATTTAGTCAAATTTTCAATACTGT
.....................................*...........
          ...G.......................*......................
               .....*................*..........................*...............
                              .......T...............................
                                         .......................................