Existing methods for bulk loading disk-based multidimensional points involve multiple applications of external sorting. In this paper, we propose techniques that apply linear scan, and are therefore significantly faster. The resulting FMBI Index possesses several desirable properties, including almost full and square nodes with zero overlap, and has excellent query performance. As a second contribution, we develop an adaptive version AMBI, which utilizes the query workload to build a partial index only for parts of the data space that contain query results. Finally, we extend FMBI and AMBI to parallel bulk loading and query processing in distributed systems. An extensive experimental evaluation with real datasets confirms that FMBI and AMBI clearly outperform competitors in terms of combined index construction and query processing cost, sometimes by orders of magnitude.
翻译:现有基于磁盘的多维点批量加载方法通常涉及多次外部排序操作。本文提出采用线性扫描的技术,从而显著提升处理速度。所构建的FMBI索引具备多项优良特性:节点接近完全填充且呈方形结构、零重叠区域,并展现出卓越的查询性能。作为第二项贡献,我们开发了自适应版本AMBI,该版本利用查询工作负载构建局部索引,仅覆盖包含查询结果的数据空间区域。最后,我们将FMBI与AMBI扩展至分布式系统中的并行批量加载与查询处理。基于真实数据集的大量实验评估证实,FMBI和AMBI在综合索引构建与查询处理成本方面显著优于现有方法,性能提升可达数量级。