A learned multi-dimensional index is a data structure that efficiently answers multi-dimensional orthogonal queries by understanding the data distribution using machine learning models. One of the existing problems is that the search performance significantly decreases when the distribution of data stored in the data structure becomes skewed due to update operations. To overcome this problem, we propose FlexFlood, a flexible variant of Flood. FlexFlood partially reconstructs the internal structure when the data distribution becomes skewed. Moreover, FlexFlood is the first learned multi-dimensional index that guarantees the time complexity of the update operation. Through experiments using both artificial and real-world data, we demonstrate that the search performance when the data distribution becomes skewed is up to 10 times faster than existing methods. We also found that partial reconstruction takes only about twice as much time as naive data updating.
翻译:学习型多维索引是一种通过机器学习模型理解数据分布,从而高效回答多维正交查询的数据结构。现有问题之一是,当数据结构中存储的数据分布因更新操作而发生偏斜时,其搜索性能会显著下降。为克服此问题,我们提出FlexFlood,它是Flood的一种灵活变体。FlexFlood在数据分布发生偏斜时,会部分重构其内部结构。此外,FlexFlood是首个保证更新操作时间复杂度界限的学习型多维索引。通过使用人工数据和真实世界数据的实验,我们证明在数据分布发生偏斜时,其搜索性能比现有方法快达10倍。我们还发现,部分重构所花费的时间仅约为朴素数据更新的两倍。