Range queries are simple and popular types of queries used in data retrieval. However, extracting exact and complete information using range queries is costly. As a remedy, some previous work proposed a faster principle, {\em approximate} search with range queries, also called single range cover (SRC) search. It can, however, produce some false positives. In this work we introduce a new SRC search structure, a $c$-DAG (Directed Acyclic Graph), which provably decreases the average number of false positives by logarithmic factor while keeping asymptotically same time and memory complexities as a classic tree structure. A $c$-DAG is a tunable augmentation of the 1D-Tree with denser overlapping branches ($c \geq 3$ children per node). We perform a competitive analysis of a $c$-DAG with respect to 1D-Tree and derive an additive constant time overhead and a multiplicative logarithmic improvement of the false positives ratio, on average. We also provide a generic framework to extend our results to empirical distributions of queries, and demonstrate its effectiveness for Gowalla dataset. Finally, we quantify and discuss security and privacy aspects of SRC search on $c$-DAG vs 1D-Tree, mainly mitigation of structural leakage, which makes $c$-DAG a good data structure candidate for deployment in privacy-preserving systems (e.g., searchable encryption) and multimedia retrieval.
翻译:范围查询是数据检索中常用且简单的查询类型。然而,利用范围查询提取精确且完整的信息成本较高。为解决此问题,先前研究提出了一种更快速的原理——基于范围查询的近似搜索,亦称为单范围覆盖搜索。但该方法可能产生误报。本文提出一种新型单范围覆盖搜索结构——c-有向无环图,该结构在保持与经典树结构渐进相同时间和空间复杂度的同时,可证明将平均误报数量降低对数倍。c-有向无环图是对一维树的可调增强结构,通过更密集的重叠分支实现(每个节点包含c≥3个子节点)。我们通过竞争性分析比较c-有向无环图与一维树的性能,推导出平均情况下附加常数时间开销与误报率对数倍改善的乘积关系。同时,我们提出通用框架将研究成果扩展至查询的经验分布,并在Gowalla数据集上验证其有效性。最后,我们量化并讨论了c-有向无环图与一维树在单范围覆盖搜索中的安全与隐私特性,重点分析结构泄漏的缓解机制,这使得c-有向无环图成为隐私保护系统(如可搜索加密)和多媒体检索中极具潜力的数据结构候选方案。