Spatial natural language interface to database systems provide non-expert users with convenient access to spatial data through natural language queries. However, the scarcity of high-quality spatial natural language query corpora limits the performance of such systems. Existing methods rely on manual knowledge base construction and template-based dynamic generation, which suffer from low construction efficiency and unstable corpus quality. This paper presents semantic-aware spatial corpus construction (SSCC), a tool designed for constructing high-quality spatial natural language query and executable language query pair corpora. SSCC consists of two core modules: (i) a knowledge base construction module based on spatial relations, which extracts and determines spatial relations from datasets, and (ii) a template-augmented query pair corpus generation module, which produces query pairs via template matching and parameter substitution. The tool ensures geometric consistency and adherence to spatial logic in the generated spatial relations. Experimental results demonstrate that SSCC achieves (i) a 53x efficiency improvement for knowledge base construction and (ii) a 2.5x effectiveness improvement for query pair corpus. SSCC provides high-quality corpus support for spatial natural language interface training, substantially reducing both time and labor costs in corpus construction.
翻译:面向数据库系统的空间自然语言接口允许非专业用户通过自然语言查询便捷访问空间数据。然而,高质量空间自然语言查询语料库的稀缺性限制了此类系统的性能。现有方法依赖于人工知识库构建和基于模板的动态生成,存在构建效率低、语料质量不稳定的问题。本文提出语义感知空间语料库构建工具,该工具专为构建高质量的空间自然语言查询与可执行语言查询对语料库而设计。该工具包含两个核心模块:(i)基于空间关系的知识库构建模块,用于从数据集中提取并确定空间关系;(ii)模板增强的查询对语料生成模块,通过模板匹配与参数替换生成查询对。该工具确保生成的空间关系符合几何一致性与空间逻辑。实验结果表明,该工具实现了(i)知识库构建效率提升53倍,以及(ii)查询对语料库构建效能提升2.5倍。该工具为空间自然语言接口训练提供了高质量的语料支持,显著降低了语料构建的时间与人力成本。