The task of building a natural language interface to a database, known as NLIDB, has recently gained significant attention from both the database and Natural Language Processing (NLP) communities. With the proliferation of geospatial datasets driven by the rapid emergence of location-aware sensors, geospatial databases play a vital role in supporting geospatial applications. However, querying geospatial and temporal databases differs substantially from querying traditional relational databases due to the presence of geospatial topological operators and temporal operators. To bridge the gap between geospatial query languages and non-expert users, the geospatial research community has increasingly focused on developing NLIDBs for geospatial databases. Yet, existing research remains fragmented across systems, datasets, and methodological choices, making it difficult to clearly understand the landscape of existing methods, their strengths and weaknesses, and opportunities for future research. Existing surveys on NLIDBs focus on general-purpose database systems and do not treat geospatial and temporal databases as primary focus for analysis. To address this gap, this paper presents a comprehensive survey of studies on NLIDBs for geospatial and temporal databases. Specifically, we provide a detailed overview of datasets, evaluation metrics, and the taxonomy of the methods for geospatial and temporal NLIDBs, as well as a comparative analysis of the existing methods. Our survey reveals recurring trends in existing methods, substantial variation in datasets and evaluation practices, and several open challenges that continue to hinder progress in this area. Based on these findings, we identify promising directions for future research to advance natural language interfaces to geospatial and temporal databases.
翻译:构建数据库自然语言接口(NLIDB)的任务近期引起了数据库与自然语言处理(NLP)领域的广泛关注。随着位置感知传感器的迅速普及推动地理空间数据集的激增,地理空间数据库在支撑地理空间应用中发挥着关键作用。然而,由于地理空间拓扑算子与时间算子的存在,查询地理空间与时间数据库的方式与传统关系型数据库存在本质差异。为弥合地理空间查询语言与非专业用户之间的鸿沟,地理空间研究社区正日益聚焦于开发面向地理空间数据库的NLIDB系统。然而,现有研究仍分散于不同系统、数据集与方法论选择之中,使得现有方法的整体面貌、优劣特征以及未来研究机遇难以清晰把握。现有的NLIDB综述侧重于通用数据库系统,并未将地理空间与时间数据库作为核心分析对象。为填补这一空白,本文对面向地理空间与时间数据库的NLIDB研究展开了系统性综述。具体而言,我们详细梳理了地理空间与时间NLIDB的数据集、评估指标与方法分类体系,并对现有方法进行了比较分析。本综述揭示了现有方法中反复出现的趋势、数据集与评估实践中的显著差异,以及持续阻碍该领域发展的若干开放性挑战。基于上述发现,我们指出了推动地理空间与时间数据库自然语言接口发展的未来研究方向。