Text-based delivery addresses, as the data foundation for logistics systems, contain abundant and crucial location information. How to effectively encode the delivery address is a core task to boost the performance of downstream tasks in the logistics system. Pre-trained Models (PTMs) designed for Natural Language Process (NLP) have emerged as the dominant tools for encoding semantic information in text. Though promising, those NLP-based PTMs fall short of encoding geographic knowledge in the delivery address, which considerably trims down the performance of delivery-related tasks in logistic systems such as Cainiao. To tackle the above problem, we propose a domain-specific pre-trained model, named G2PTL, a Geography-Graph Pre-trained model for delivery address in Logistics field. G2PTL combines the semantic learning capabilities of text pre-training with the geographical-relationship encoding abilities of graph modeling. Specifically, we first utilize real-world logistics delivery data to construct a large-scale heterogeneous graph of delivery addresses, which contains abundant geographic knowledge and delivery information. Then, G2PTL is pre-trained with subgraphs sampled from the heterogeneous graph. Comprehensive experiments are conducted to demonstrate the effectiveness of G2PTL through four downstream tasks in logistics systems on real-world datasets. G2PTL has been deployed in production in Cainiao's logistics system, which significantly improves the performance of delivery-related tasks.
翻译:基于文本的送货地址作为物流系统的数据基础,包含丰富且关键的位置信息。如何有效编码送货地址是提升物流系统下游任务性能的核心环节。专为自然语言处理设计的预训练模型已成为文本语义信息编码的主流工具。然而,这些基于NLP的预训练模型在编码送货地址中的地理知识方面存在不足,显著降低了菜鸟等物流系统中配送相关任务的性能。针对上述问题,我们提出一种领域专用的预训练模型——G2PTL,即面向物流领域送货地址的地理图预训练模型。G2PTL将文本预训练的语义学习能力与图建模的地理关系编码能力相结合。具体而言,我们首先利用真实物流送货数据构建大规模送货地址异构图,其中包含丰富的地理知识和配送信息;随后,G2PTL通过从异构图中采样子图进行预训练。我们基于真实数据集,通过物流系统中四个下游任务开展全面实验,以验证G2PTL的有效性。目前G2PTL已在菜鸟物流系统中投入生产,显著提升了配送相关任务的性能。