Accessing Index Fibres

Overview

Basically each index consists of a set of tables, called fibres. The set of available fibres of an index Index<TText, TSpec> depends on the index specialization TSpec.

Fibres

Description

EsaText

The original text the index should be based on. Can be either a sequence or a StringSet.

EsaSA

The suffix array stores the begin positions of all suffixes in lexicographical order.

EsaLcp

The lcp table stores at position i the length of the longest common prefix between suffix with rank i and rank i+1.

EsaChildtab

See [AKO04]

EsaBwt

The Burrows-Wheeler table stores at position i the character left of the suffix with rank i.

EsaRawText

Virtually concatenates all strings of the EsaText fibre.

WOTDIndexFibres

Description

WotdText

The original text the index should be based on.

WotdSA

The suffix array stores the begin positions of all suffixes in lexicographical order.

WotdDir

[GKS03]

WotdRawText

Virtually concatenates all strings of the WotdText fibre.

DfiIndexFibres

Description

Type

DfiText

The original text the index should be based on.

First template argument of the Index. Can be either a sequence or a StringSet.

DfiSA

The suffix array stores the begin positions of all suffixes in lexicographical order.

String over the SAValue type of the index.

DfiDir

See [GKS03].

String over the Size type of the index.

DfiRawText

Virtually concatenates all strings of the DfiText fibre.

ContainerConcept over the alphabet of the text.

QGramIndexFibres

Description

Type

QGramText

The original text the index should be based on.

First template argument of the Index. Can be either a sequence or a StringSet.

QGramShape

The q-gram Shape.

Specified by the first template argument of IndexQGram.

QGramSA

The suffix array stores the begin positions of all suffixes in lexicographical order.

String over the SAValue type of the index.

QGramDir

The directory maps q-gram hash values to start indices in the QGramSA fibre.

String over the Size type of the index.

QGramCounts

Stores numbers of occurrences per sequence of each q-gram in pairs (seqNo,count), count>0.

String over Pair of the Size type of the index.

QGramCountsDir

The counts directory maps q-gram hash values to start indices in the QGramCounts fibre.

String over the Size type of the index.

QGramBucketMap

Used by the OpenAddressingQGramIndex index to store the hash value occupancy in the QGramDir fibre.

String over the Value type of the shape.

QGramRawText

Virtually concatenates all strings of the QGramText fibre.

ContainerConcept over the alphabet of the text.

The first column in each table above contains the tags to select the corresponding fibre. Most of these tags are aliases for the same tag, e.g. EsaSA, QGramSA, … are aliases for FibreSA. If you write an algorithm that is generic in the type of index, you can use FibreText to specify the fibre that stores the index text.

Creation

In most cases you don’t need to create the fibres of an index by hand. Most algorithms and data structures create them automatically, e.g. Finder or VSTreeIterator. If you want to specify a certain index construction algorithm, have to recreate a fibre or manually access a fibre you can recreate or create on-demand a fibre by indexCreate and indexRequire. If your algorithm should behave differently depending on the presence or absence of a fibre (and the fibre should then not be created), you can test for presence by indexSupplied.

Access

The type of each fibre can be determined by the metafunction Fibre. To access a fibre you can use the function getFibre whose return type is the result of Fibre. The second argument of both functions is a tag to select a specific fibre. See the first column in the tables above. One fibre in every index is the text to be indexed itself. This fibre can be assigned during the construction. For the ease of use, there exist shortcuts to access frequently used fibres:

Shortcut

Expands To …

indexBucketMap(index)

getFibre(index, FibreBucketMap())

indexBwt(index)

getFibre(index, FibreBwt())

indexChildtab(index)

getFibre(index, FibreChildtab())

indexCounts(index)

getFibre(index, FibreCounts())

indexCountsDir(index)

getFibre(index, FibreCountsDir())

indexLcp(index)

getFibre(index, FibreLcp())

indexRawText(index)

getFibre(index, FibreRawText())

indexSA(index)

getFibre(index, FibreSA())

indexShape(index)

getFibre(index, FibreShape())

indexText(index)

getFibre(index, FibreText())

and to access a single value:

Shortcut

Expands To …

bwtAt(pos, index)

indexBwt(index)[pos]

childAt(pos, index)

indexChildtab(index)[pos]

dirAt(pos, index)

indexDir(index)[pos]

lcpAt(pos, index)

indexLcp(index)[pos]

rawtextAt(pos, index)

indexRawText(index)[pos]

saAt(pos, index)

indexSA(index)[pos]

textAt(pos, index)

indexText(index)[pos]

Please note that textAt can also be used if the index text is a StringSet. pos can then be a SAValue.