Accessing Index Fibres

Overview

Basically each index consists of a set of tables, called fibres. The set of available fibres of an index Index<TText, TSpec> depends on the index specialization TSpec.

Fibres Description
EsaText The original text the index should be based on. Can be either a sequence or a StringSet.
EsaSA The suffix array stores the begin positions of all suffixes in lexicographical order.
EsaLcp The lcp table stores at position i the length of the longest common prefix between suffix with rank i and rank i+1.
EsaChildtab See [AKO04]
EsaBwt The Burrows-Wheeler table stores at position i the character left of the suffix with rank i.
EsaRawText Virtually concatenates all strings of the EsaText fibre.
WOTDIndexFibres Description
WotdText The original text the index should be based on.
WotdSA The suffix array stores the begin positions of all suffixes in lexicographical order.
WotdDir [GKS03]
WotdRawText Virtually concatenates all strings of the WotdText fibre.
DfiIndexFibres Description Type
DfiText The original text the index should be based on. First template argument of the Index. Can be either a sequence or a StringSet.
DfiSA The suffix array stores the begin positions of all suffixes in lexicographical order. String over the SAValue type of the index.
DfiDir See [GKS03]. String over the Size type of the index.
DfiRawText Virtually concatenates all strings of the DfiText fibre. ContainerConcept over the alphabet of the text.
QGramIndexFibres Description Type
QGramText The original text the index should be based on. First template argument of the Index. Can be either a sequence or a StringSet.
QGramShape The q-gram Shape. Specified by the first template argument of IndexQGram.
QGramSA The suffix array stores the begin positions of all suffixes in lexicographical order. String over the SAValue type of the index.
QGramDir The directory maps q-gram hash values to start indices in the QGramSA fibre. String over the Size type of the index.
QGramCounts Stores numbers of occurrences per sequence of each q-gram in pairs (seqNo,count), count>0. String over Pair of the Size type of the index.
QGramCountsDir The counts directory maps q-gram hash values to start indices in the QGramCounts fibre. String over the Size type of the index.
QGramBucketMap Used by the OpenAddressingQGramIndex index to store the hash value occupancy in the QGramDir fibre. String over the Value type of the shape.
QGramRawText Virtually concatenates all strings of the QGramText fibre. ContainerConcept over the alphabet of the text.
Pizza & Chili Index Fibres Description Type
PizzaChiliText The original text the index should be based on. First template argument of the Index. Must be a sequence (no support for StringSet).
PizzaChiliCompressed Specialization dependent data structure to store the compressed index. Depends on the compressed index.

The first column in each table above contains the tags to select the corresponding fibre. Most of these tags are aliases for the same tag, e.g. EsaSA, QGramSA, ... are aliases for FibreSA. If you write an algorithm that is generic in the type of index can use FibreText to specify the fibre that stores the index text.

Creation

In most cases you don’t need to create the fibres of an index by hand. Most algorithms and data structures create them automatically, e.g. Finder or VSTreeIterator. If you want to specify a certain index construction algorithm, have to recreate a fibre or manually access a fibre you can recreate or create on-demand a fibre by indexCreate and indexRequire. If your algorithm should behave differently depending on the presence or absence of a fibre (and the fibre should then not be created), you can test for presence by indexSupplied.

Access

The type of each fibre can be determined by the metafunction Fibre. To access a fibre you can use the function getFibre whose return type is the result of Fibre. The second argument of both functions is a tag to select a specific fibre. See the first column in the tables above. One fibre in every index is the text to be indexed itself. This fibre can be assigned during the construction. For the ease of use, there exist shortcuts to access frequently used fibres:

Shortcut Expands To ...
indexBucketMap(index) getFibre(index, FibreBucketMap())
indexBwt(index) getFibre(index, FibreBwt())
indexChildtab(index) getFibre(index, FibreChildtab())
indexCounts(index) getFibre(index, FibreCounts())
indexCountsDir(index) getFibre(index, FibreCountsDir())
indexLcp(index) getFibre(index, FibreLcp())
indexRawText(index) getFibre(index, FibreRawText())
indexSA(index) getFibre(index, FibreSA())
indexShape(index) getFibre(index, FibreShape())
indexText(index) getFibre(index, FibreText())

and to access a single values:

Shortcut Expands To ...
bwtAt(pos, index) indexBwt(index)[pos]
childAt(pos, index) indexChildtab(index)[pos]
dirAt(pos, index) indexDir(index)[pos]
lcpAt(pos, index) indexLcp(index)[pos]
rawtextAt(pos, index) indexRawText(index)[pos]
saAt(pos, index) indexSA(index)[pos]
textAt(pos, index) indexText(index)[pos]

Please note that textAt can also be used if the index text is a StringSet. pos can then be a SAValue.