Indexing a set of strings for prefix search or membership queries is a fundamental task with many applications such as information retrieval or database systems. A classic abstract data type for modelling such an index is a trie. Due to the fundamental nature of this problem, it has sparked much interest, leading to a variety of trie implementations with different characteristics. A trie implementation that has been well-used in practice is the double-array (trie) consisting of merely two integer arrays. While a traversal takes constant time per node visit, the needed space consumption in computer words can be as large as the product of the number of nodes and the alphabet size. Despite that several heuristics have been proposed on lowering the space requirements, we are unaware of any theoretical guarantees. In this paper, we study the decision problem whether there exists a double-array of a given size. To this end, we first draw a connection to the sparse matrix compression problem, which makes our problem NP-complete for alphabet sizes linear to the number of nodes. We further propose a reduction from the restricted directed Hamiltonian path problem, leading to NP-completeness even for logarithmic-sized alphabets.
翻译:对一组字符串建立索引以支持前缀搜索或成员查询是一项基础任务,在信息检索或数据库系统等众多应用中具有重要地位。用于建模此类索引的经典抽象数据类型是字典树。由于该问题的根本性质,它引发了广泛关注,从而产生了多种具有不同特性的字典树实现。在实践中广泛应用的一种字典树实现是双数组(字典树),它仅由两个整数数组组成。虽然每次节点访问的遍历时间为常数时间,但所需的空间消耗(以计算机字为单位)可能达到节点数与字母表大小的乘积。尽管已有多种启发式方法被提出以降低空间需求,但我们尚未发现任何理论保证。本文研究是否存在给定大小的双数组决策问题。为此,我们首先建立了与该问题与稀疏矩阵压缩问题的联系,这使得该问题在字母表大小与节点数成线性关系时变为NP完全问题。我们进一步提出从受限有向哈密顿路径问题的归约,从而证明即使对于对数级大小的字母表,该问题也是NP完全的。
Alphabet is mostly a collection of companies. This newer Google is a bit slimmed down, with the companies that are pretty far afield of our main internet products contained in Alphabet instead.https://abc.xyz/