This paper addresses the Counting Long Aggregated Visits problem, which is defined as follows. We are given $n$ users and $m$ regions, where each user spends some time visiting some regions. For a parameter $k$ and a query consisting of a subset of $r$ regions, the task is to count the number of distinct users whose aggregate time spent visiting the query regions is at least $k$. This problem is motivated by queries arising in the analysis of large-scale mobility datasets. We present several exact and approximate data structures for supporting counting long aggregated visits, as well as conditional and unconditional lower bounds. First, we describe an exact data structure that exhibits a space-time tradeoff, as well as efficient approximate solutions based on sampling and sketching techniques. We then study the problem in geometric settings where regions are points in $\mathbb{R}^d$ and queries are hyperrectangles, and derive exact data structures that achieve improved performance in these structured spaces.
翻译:本文研究"聚合长期访问计数问题",其定义如下:给定n个用户与m个区域,每个用户会在若干区域停留特定时长。针对参数k及由r个区域组成的查询,需要统计在查询区域累计停留时间不少于k的独立用户数量。该问题源于大规模移动数据集分析中的实际查询需求。我们提出了若干精确与近似数据结构以支持聚合长期访问计数,并建立了条件与无条件下界。首先,我们描述了一种展现时空权衡的精确数据结构,以及基于采样与草图技术的高效近似解法。随后,我们研究区域为$\mathbb{R}^d$空间中的点、查询为超矩形的几何场景,推导出能在此类结构化空间中实现更优性能的精确数据结构。