In data exploration, users need to analyze large data files quickly, aiming to minimize data-to-analysis time. While recent adaptive indexing approaches address this need, they are cases where demonstrate poor performance. Particularly, during the initial queries, in regions with a high density of objects, and in very large files over commodity hardware. This work introduces an approach for adaptive indexing driven by both query workload and user-defined accuracy constraints to support approximate query answering. The approach is based on partial index adaptation which reduces the costs associated with reading data files and refining indexes. We leverage a hierarchical tile-based indexing scheme and its stored metadata to provide efficient query evaluation, ensuring accuracy within user-specified bounds. Our preliminary evaluation demonstrates improvement on query evaluation time, especially during initial user exploration.
翻译:在数据探索过程中,用户需要快速分析大型数据文件,以最小化从数据到分析的时间。尽管近期的自适应索引方法致力于满足这一需求,但在某些情况下其性能表现不佳,特别是在初始查询阶段、对象密度较高的区域,以及在普通硬件上处理超大型文件时。本研究提出了一种由查询工作负载和用户定义的精度约束共同驱动的自适应索引方法,以支持近似查询应答。该方法基于部分索引自适应,降低了读取数据文件和优化索引的相关成本。我们利用一种基于层次化分块的索引方案及其存储的元数据来提供高效的查询评估,确保结果精度在用户指定的范围内。我们的初步评估表明,该方法在查询评估时间上取得了改进,尤其是在用户初始探索阶段。