Chinese geographic re-ranking task aims to find the most relevant addresses among retrieved candidates, which is crucial for location-related services such as navigation maps. Unlike the general sentences, geographic contexts are closely intertwined with geographical concepts, from general spans (e.g., province) to specific spans (e.g., road). Given this feature, we propose an innovative framework, namely Geo-Encoder, to more effectively integrate Chinese geographical semantics into re-ranking pipelines. Our methodology begins by employing off-the-shelf tools to associate text with geographical spans, treating them as chunking units. Then, we present a multi-task learning module to simultaneously acquire an effective attention matrix that determines chunk contributions to extra semantic representations. Furthermore, we put forth an asynchronous update mechanism for the proposed addition task, aiming to guide the model capable of effectively focusing on specific chunks. Experiments on two distinct Chinese geographic re-ranking datasets, show that the Geo-Encoder achieves significant improvements when compared to state-of-the-art baselines. Notably, it leads to a substantial improvement in the Hit@1 score of MGEO-BERT, increasing it by 6.22% from 62.76 to 68.98 on the GeoTES dataset.
翻译:中文地理重排序任务旨在从检索候选中找出最相关的地址,这对导航地图等位置相关服务至关重要。与通用句子不同,地理上下文与地理概念紧密交织,从通用片段(如省份)到具体片段(如道路)。基于这一特征,我们提出了一种创新框架,即Geo-Encoder,以更有效地将中文地理语义融入重排序流程。该方法首先利用现成工具将文本与地理片段关联,将其视为分块单元。接着,我们引入一个多任务学习模块,同步获取确定片段对额外语义表示贡献程度的有效注意力矩阵。此外,我们针对所提出的附加任务设计了一种异步更新机制,旨在引导模型能够有效聚焦于特定片段。在两个不同的中文地理重排序数据集上的实验表明,与最先进的基线方法相比,Geo-Encoder取得了显著提升。值得注意的是,在GeoTES数据集上,它将MGEO-BERT的Hit@1分数从62.76提升至68.98,提升了6.22%。