Given a set of $n$ colored points $P \subset \mathbb{R}^d$ we wish to store $P$ such that, given some query region $Q$, we can efficiently report the colors of the points appearing in the query region, along with their frequencies. This is the \emph{color frequency reporting} problem. We study the case where query regions $Q$ are axis-aligned boxes or dominance ranges. If $Q$ contains $k$ colors, the main goal is to achieve ``strictly output sensitive'' query time $O(f(n) + k)$. Firstly, we show that, for every $s \in \{ 2, \dots, n \}$, there exists a simple $O(ns\log_s n)$ size data structure for points in $\mathbb{R}^2$ that allows frequency reporting queries in $O(\log n + k\log_s n)$ time. Secondly, we give a lower bound for the weighted version of the problem in the arithmetic model of computation, proving that with $O(m)$ space one can not achieve query times better than $Ω\left(φ\frac{\log (n / φ)}{\log (m / n)}\right)$, where $φ$ is the number of possible colors. This means that our data structure is near-optimal. We extend these results to higher dimensions as well. Thirdly, we present a transformation that allows us to reduce the space usage of the aforementioned datastructure to $O(n(s φ)^\varepsilon \log_s n)$. Finally, we give an $O(n^{1+\varepsilon} + m \log n + K)$-time algorithm that can answer $m$ dominance queries $\mathbb{R}^2$ with total output complexity $K$, while using only linear working space.
翻译:给定一组 $n$ 个彩色点 $P \subset \mathbb{R}^d$,我们希望存储 $P$,使得对于任意查询区域 $Q$,能够高效地报告出现在查询区域内点的颜色及其频率。此即\emph{彩色频率报告}问题。本文研究查询区域 $Q$ 为轴对齐包围盒或支配范围的情形。若 $Q$ 包含 $k$ 种颜色,主要目标是实现“严格输出敏感”的查询时间 $O(f(n) + k)$。首先,我们证明对于任意 $s \in \{ 2, \dots, n \}$,存在一个简单的 $O(ns\log_s n)$ 空间数据结构,用于存储 $\mathbb{R}^2$ 中的点,能够在 $O(\log n + k\log_s n)$ 时间内完成频率报告查询。其次,我们在算术计算模型下对该问题的加权版本给出下界,证明在 $O(m)$ 空间下无法实现优于 $Ω\left(φ\frac{\log (n / φ)}{\log (m / n)}\right)$ 的查询时间,其中 $φ$ 为可能颜色总数。这表明我们的数据结构近乎最优。我们将这些结果推广至高维情形。再次,我们提出一种转换方法,可将前述数据结构的空间使用量降低至 $O(n(s φ)^\varepsilon \log_s n)$。最后,我们给出一个 $O(n^{1+\varepsilon} + m \log n + K)$ 时间算法,能够在仅使用线性工作空间的前提下,回答 $\mathbb{R}^2$ 中 $m$ 个支配查询,其总输出复杂度为 $K$。