Accessing Index Fibres¶
Overview¶
Basically each index consists of a set of tables, called fibres. The set of available fibres of an index Index<TText, TSpec> depends on the index specialization TSpec.
Fibres | Description |
---|---|
EsaText | The original text the index should be based on. Can be either a sequence or a StringSet. |
EsaSA | The suffix array stores the begin positions of all suffixes in lexicographical order. |
EsaLcp | The lcp table stores at position i the length of the longest common prefix between suffix with rank i and rank i+1. |
EsaChildtab | See [AKO04] |
EsaBwt | The Burrows-Wheeler table stores at position i the character left of the suffix with rank i. |
EsaRawText | Virtually concatenates all strings of the EsaText fibre. |
WOTDIndexFibres | Description |
---|---|
WotdText | The original text the index should be based on. |
WotdSA | The suffix array stores the begin positions of all suffixes in lexicographical order. |
WotdDir | [GKS03] |
WotdRawText | Virtually concatenates all strings of the WotdText fibre. |
DfiIndexFibres | Description | Type |
---|---|---|
DfiText | The original text the index should be based on. | First template argument of the Index. Can be either a sequence or a StringSet. |
DfiSA | The suffix array stores the begin positions of all suffixes in lexicographical order. | String over the SAValue type of the index. |
DfiDir | See [GKS03]. | String over the Size type of the index. |
DfiRawText | Virtually concatenates all strings of the DfiText fibre. | ContainerConcept over the alphabet of the text. |
QGramIndexFibres | Description | Type |
---|---|---|
QGramText | The original text the index should be based on. | First template argument of the Index. Can be either a sequence or a StringSet. |
QGramShape | The q-gram Shape. | Specified by the first template argument of IndexQGram. |
QGramSA | The suffix array stores the begin positions of all suffixes in lexicographical order. | String over the SAValue type of the index. |
QGramDir | The directory maps q-gram hash values to start indices in the QGramSA fibre. | String over the Size type of the index. |
QGramCounts | Stores numbers of occurrences per sequence of each q-gram in pairs (seqNo,count), count>0. | String over Pair of the Size type of the index. |
QGramCountsDir | The counts directory maps q-gram hash values to start indices in the QGramCounts fibre. | String over the Size type of the index. |
QGramBucketMap | Used by the OpenAddressingQGramIndex index to store the hash value occupancy in the QGramDir fibre. | String over the Value type of the shape. |
QGramRawText | Virtually concatenates all strings of the QGramText fibre. | ContainerConcept over the alphabet of the text. |
Pizza & Chili Index Fibres | Description | Type |
---|---|---|
PizzaChiliText | The original text the index should be based on. | First template argument of the Index. Must be a sequence (no support for StringSet). |
PizzaChiliCompressed | Specialization dependent data structure to store the compressed index. | Depends on the compressed index. |
The first column in each table above contains the tags to select the corresponding fibre. Most of these tags are aliases for the same tag, e.g. EsaSA, QGramSA, ... are aliases for FibreSA. If you write an algorithm that is generic in the type of index can use FibreText to specify the fibre that stores the index text.
Creation¶
In most cases you don’t need to create the fibres of an index by hand. Most algorithms and data structures create them automatically, e.g. Finder or VSTreeIterator. If you want to specify a certain index construction algorithm, have to recreate a fibre or manually access a fibre you can recreate or create on-demand a fibre by indexCreate and indexRequire. If your algorithm should behave differently depending on the presence or absence of a fibre (and the fibre should then not be created), you can test for presence by indexSupplied.
Access¶
The type of each fibre can be determined by the metafunction Fibre. To access a fibre you can use the function getFibre whose return type is the result of Fibre. The second argument of both functions is a tag to select a specific fibre. See the first column in the tables above. One fibre in every index is the text to be indexed itself. This fibre can be assigned during the construction. For the ease of use, there exist shortcuts to access frequently used fibres:
and to access a single values:
Please note that textAt can also be used if the index text is a StringSet. pos can then be a SAValue.