Efficiently computing spatio-textual queries has become increasingly important in various applications that need to quickly retrieve geolocated entities associated with textual information, such as in location-based services and social networks. To accelerate such queries, several works have proposed combining spatial and textual indices into hybrid index structures. Recently, the novel idea of replacing traditional indices with ML models has attracted a lot of attention. This includes works on learned spatial indices, where the main challenge is to address the lack of a total ordering among objects in a multidimensional space. In this work, we investigate how to extend this novel type of index design to the case of spatio-textual data. We study different design choices, based on either loose or tight coupling between the spatial and textual part, as well as a hybrid index that combines a traditional and a learned component. We also perform an experimental evaluation using several real-world datasets to assess the potential benefits of using a learned index for evaluating spatio-textual queries.
翻译:高效计算空间文本查询在需要快速检索与文本信息相关的地理位置实体的各类应用中日益重要,例如位置服务与社交网络。为加速此类查询,已有研究提出将空间索引与文本索引结合为混合索引结构。近年来,用机器学习模型替代传统索引的新思路引发了广泛关注,其中涉及学习型空间索引的研究——其核心挑战在于解决多维空间中对象缺乏全序关系的问题。本文探索如何将这种新型索引设计方法扩展到空间文本数据场景。我们研究了基于空间部分与文本部分松耦合或紧耦合的不同设计方案,以及融合传统组件与学习型组件的混合索引。通过使用多个真实数据集进行实验评估,我们验证了学习型索引在评估空间文本查询中的潜在优势。