In e-commerce search, relevance between query and documents is an essential requirement for satisfying user experience. Different from traditional e-commerce platforms that offer products, users search on life service platforms such as Meituan mainly for product providers, which usually have abundant structured information, e.g. name, address, category, thousands of products. Modeling search relevance with these rich structured contents is challenging due to the following issues: (1) there is language distribution discrepancy among different fields of structured document, making it difficult to directly adopt off-the-shelf pretrained language model based methods like BERT. (2) different fields usually have different importance and their length vary greatly, making it difficult to extract document information helpful for relevance matching. To tackle these issues, in this paper we propose a novel two-stage pretraining and matching architecture for relevance matching with rich structured documents. At pretraining stage, we propose an effective pretraining method that employs both query and multiple fields of document as inputs, including an effective information compression method for lengthy fields. At relevance matching stage, a novel matching method is proposed by leveraging domain knowledge in search query to generate more effective document representations for relevance scoring. Extensive offline experiments and online A/B tests on millions of users verify that the proposed architectures effectively improve the performance of relevance modeling. The model has already been deployed online, serving the search traffic of Meituan for over a year.
翻译:在电子商务搜索中,查询与文档之间的相关性是满足用户体验的基本要求。与传统提供商品搜索的电商平台不同,用户在美团等生活服务平台上主要搜索服务提供商,这些服务商通常包含丰富的结构化信息,如名称、地址、类别及数千种商品。利用这些结构化内容建模搜索相关性面临以下挑战:(1)结构化文档不同字段间存在语言分布差异,导致直接采用BERT等现成预训练语言模型方法存在困难;(2)不同字段的重要性和文本长度差异显著,难以提取有助于相关性匹配的文档信息。针对这些问题,本文提出一种新颖的两阶段预训练与匹配架构,用于处理含丰富结构化文档的相关性匹配。在预训练阶段,我们提出一种有效的预训练方法,将查询与文档的多个字段作为输入,并包含针对长文本字段的高效信息压缩方法。在相关性匹配阶段,提出一种创新匹配方法,通过利用搜索查询中的领域知识生成更有效的文档表征进行相关性评分。基于数百万用户的广泛离线实验和在线A/B测试验证表明,所提架构有效提升了相关性建模性能。该模型已在线部署超过一年,服务于美团的搜索流量。