We consider succinct data structures for representing a set of $n$ horizontal line segments in the plane given in rank space to support \emph{segment access}, \emph{segment selection}, and \emph{segment rank} queries. A segment access query finds the segment $(x_1, x_2, y)$ given its $y$-coordinate ($y$-coordinates of the segments are distinct), a segment selection query finds the $j$th smallest segment (the segment with the $j$th smallest $y$-coordinate) among the segments crossing the vertical line for a given $x$-coordinate, and a segment rank query finds the number of segments crossing the vertical line through $x$-coordinate $i$ with $y$-coordinate at most $y$, for a given $x$ and $y$. This problem is a central component in compressed data structures for persistent strings supporting random access. Our main result is data structure using $2n\lg{n} + O(n\lg{n}/\lg{\lg{n}})$ bits of space and $O(\lg{n}/\lg{\lg{n}})$ query time for all operations. We show that this space bound is optimal up to lower-order terms. We will also show that the query time for segment rank is optimal. The query time for segment selection is also optimal by a previous bound. To obtain our results, we present a novel segment wavelet tree data structure of independent interest. This structure is inspired by and extends the classic wavelet tree for sequences. This leads to a simple, succinct solution with $O(\log n)$ query times. We then extend this solution to obtain optimal query time. Our space lower bound follows from a simple counting argument, and our lower bound for segment rank is obtained by a reduction from 2-dimensional counting.
翻译:我们研究在秩空间中表示平面上n条水平线段的简洁数据结构,以支持线段访问、线段选择和线段秩查询。线段访问查询根据给定的y坐标(假设各线段的y坐标互不相同)查找对应的线段(x1, x2, y);线段选择查询在穿过给定x坐标垂直线段的集合中,查找第j小的线段(即具有第j小y坐标的线段);线段秩查询则针对给定的x和y坐标,统计穿过x坐标垂直线段且y坐标不超过y的线段数量。该问题是支持随机访问的持久化字符串压缩数据结构中的核心组件。我们的主要成果是提出一种数据结构,其空间占用为2n lg n + O(n lg n / lg lg n)比特,所有操作查询时间均为O(lg n / lg lg n)。我们证明该空间界在忽略低阶项意义下是最优的。同时我们还将证明线段秩查询的时间复杂度是最优的,而线段选择查询的时间复杂度通过已有界也可证是最优的。为达成这些结果,我们提出了一种具有独立研究价值的创新结构——线段小波树。该结构受经典序列小波树启发并进行了扩展,从而产生了具有O(log n)查询时间的简洁解决方案。我们进一步优化该方案以获得最优查询时间。空间下界通过简单的计数论证得到,而线段秩查询的下界则通过二维计数问题的归约获得。