Pattern Matching

Learning Objective
In this tutorial you will learn how to use the SeqAn classes finder and pattern to search a known pattern in a string or StringSet.
Difficulty
Average
Duration
40 min
Prerequisites
Sequences, Indices

Pattern matching is about searching a known string or StringSet (needle) in another string or StringSet (haystack). This tutorial will introduce you into the SeqAn classes finder and pattern. It will demonstrate how to use the spezializations of the class finder to perform either an online search or an index based seach. And you will learn how to specify the search algorithm, which can be exact or approximate.

Overview

In the case of approximate searching errors are allowed, which are either only mismatches or also indels. Additionally there are filtration algorithms which return potential matches, i.e. haystack segments possibly containing a pattern match. All searching is done by calling the function find, which takes at least two arguments:

  1. A Finder that stores all necessary information about the haystack and the last found position of the needle within the haystack.
  2. A Pattern that stores all information about the needle. Some variants of find support further arguments. The Finder and Pattern classes expect the underlying haystack and needle types as first template arguments. In addition, a second template argument specifies the search algorithm.

Each call of find finds only one match (or potential match) of the needle within the haystack. The Finder can be asked for the begin and end position of the last found match. The Pattern can be asked for the number of the found sequence if the needle is a StringSet. Subsequent calls of find can be used to find more occurrences of the needle, until no more occurrences can be found and find returns false.

In general, search algorithms can be divided into algorithms that preprocess the needle (online search) or preprocess the haystack (index search).

comments powered by Disqus