On Range Summary Queries

We study the query version of the approximate heavy hitter and quantile problems. In the former problem, the input is a parameter $\varepsilon$ and a set $P$ of $n$ points in $\mathbb{R}^d$ where each point is assigned a color from a set $C$, and we want to build a structure s.t. given any geometric range $\gamma$, we can efficiently find a list of approximate heavy hitters in $\gamma\cap P$, i.e., colors that appear at least $\varepsilon |\gamma \cap P|$ times in $\gamma \cap P$, as well as their frequencies with an additive error of $\varepsilon |\gamma \cap P|$. In the latter problem, each point is assigned a weight from a totally ordered universe and the query must output a sequence $S$ of $1+1/\varepsilon$ weights s.t. the $i$-th weight in $S$ has approximate rank $i\varepsilon|\gamma\cap P|$, meaning, rank $i\varepsilon|\gamma\cap P|$ up to an additive error of $\varepsilon|\gamma\cap P|$. Previously, optimal results were only known in 1D [WY11] but a few sub-optimal methods were available in higher dimensions [AW17, ACH+12]. We study the problems for 3D halfspace and dominance queries. We consider the real RAM model with integer registers of size $w=\Theta(\log n)$ bits. For dominance queries, we show optimal solutions for both heavy hitter and quantile problems: using linear space, we can answer both queries in time $O(\log n + 1/\varepsilon)$. Note that as the output size is $\frac{1}{\varepsilon}$, after investing the initial $O(\log n)$ searching time, our structure takes on average $O(1)$ time to find a heavy hitter or a quantile! For more general halfspace heavy hitter queries, the same optimal query time can be achieved by increasing the space by an extra $\log_w\frac{1}{\varepsilon}$ (resp. $\log\log_w\frac{1}{\varepsilon}$) factor in 3D (resp. 2D). By spending extra $\log^{O(1)}\frac{1}{\varepsilon}$ factors in time and space, we can also support quantile queries.

翻译：我们研究近似重击者和分位数问题的查询版本。在前一问题中，输入包含参数$\varepsilon$、集合$P$（由$\mathbb{R}^d$中$n$个点组成，每个点被赋予来自集合$C$的一种颜色），目标构建一种结构，使得对于任意几何范围$\gamma$，能高效找出$\gamma\cap P$中的近似重击者列表，即颜色在$\gamma\cap P$中出现至少$\varepsilon |\gamma \cap P|$次，并给出其频率（误差不超过$\varepsilon |\gamma \cap P|$）。在后一问题中，每个点被赋予一个来自全序全域的权重，查询需输出序列$S$（包含$1+1/\varepsilon$个权重），使得$S$中第$i$个权重的近似秩为$i\varepsilon|\gamma\cap P|$，即与真实秩$i\varepsilon|\gamma\cap P|$的误差不超过$\varepsilon|\gamma\cap P|$。此前，仅在一维情况下存在最优结果[WY11]，而高维空间中仅有少数次优方法[AW17, ACH+12]。我们研究三维半空间和支配查询下的这些问题。考虑实数RAM模型，整数寄存器大小为$w=\Theta(\log n)$比特。对于支配查询，我们给出重击者和分位数问题的最优解：使用线性空间，可在$O(\log n + 1/\varepsilon)$时间内回答两类查询。值得注意的是，由于输出大小为$\frac{1}{\varepsilon}$，在投入初始$O(\log n)$搜索时间后，我们的结构平均需要$O(1)$时间找到一个重击者或分位数！对于更一般的半空间重击者查询，通过将空间增加$\log_w\frac{1}{\varepsilon}$（三维）或$\log\log_w\frac{1}{\varepsilon}$（二维）因子，可实现相同的查询时间最优性。若额外增加时间和空间的$\log^{O(1)}\frac{1}{\varepsilon}$因子，还可支持分位数查询。