Recently, Conte et al. generalized the longest-common prefix (LCP) array from strings to Wheeler DFAs, and they showed that it can be used to efficiently determine matching statistics on a Wheeler DFA [DCC 2023]. However, storing the LCP array requires $ O(n \log n) $ bits, $ n $ being the number of states, while the compact representation of Wheeler DFAs often requires much less space. In particular, the BOSS representation of a de Bruijn graph only requires a linear number of bits, if the size of alphabet is constant. In this paper, we propose a sampling technique that allows to access an entry of the LCP array in logarithmic time by only storing a linear number of bits. We use our technique to provide a space-time trade-off to compute matching statistics on a Wheeler DFA. In addition, we show that by augmenting the BOSS representation of a $ k $-th order de Bruijn graph with a linear number of bits we can navigate the underlying variable-order de Bruijn graph in time logarithmic in $ k $, thus improving a previous bound by Boucher et al. which was linear in $ k $ [DCC 2015].
翻译:近期,Conte等人将最长公共前缀数组从字符串推广至Wheeler确定有限自动机,并表明其可用于高效计算Wheeler DFA的匹配统计量[DCC 2023]。然而,存储LCP数组需要$ O(n \log n) $比特($ n $为状态数),而Wheeler DFA的紧凑表示通常所需空间远小于此。特别地,de Bruijn图的BOSS表示在字母表大小恒定时仅需线性比特。本文提出一种采样技术,通过仅存储线性比特即可在对数时间内访问LCP数组的条目。我们利用该技术提供了一种时空权衡策略,用于计算Wheeler DFA上的匹配统计量。此外,我们证明:通过用线性比特增强$ k $阶de Bruijn图的BOSS表示,我们可以在时间$ \log k $内导航其底层变阶de Bruijn图,从而改进了Boucher等人先前在$ k $上呈线性的结果[DCC 2015]。