Customer behavioral data significantly impacts e-commerce search systems. However, in the case of less common queries, the associated behavioral data tends to be sparse and noisy, offering inadequate support to the search mechanism. To address this challenge, the concept of query reformulation has been introduced. It suggests that less common queries could utilize the behavior patterns of their popular counterparts with similar meanings. In Amazon product search, query reformulation has displayed its effectiveness in improving search relevance and bolstering overall revenue. Nonetheless, adapting this method for smaller or emerging businesses operating in regions with lower traffic and complex multilingual settings poses the challenge in terms of scalability and extensibility. This study focuses on overcoming this challenge by constructing a query reformulation solution capable of functioning effectively, even when faced with limited training data, in terms of quality and scale, along with relatively complex linguistic characteristics. In this paper we provide an overview of the solution implemented within Amazon product search infrastructure, which encompasses a range of elements, including refining the data mining process, redefining model training objectives, and reshaping training strategies. The effectiveness of the proposed solution is validated through online A/B testing on search ranking and Ads matching. Notably, employing the proposed solution in search ranking resulted in 0.14% and 0.29% increase in overall revenue in Japanese and Hindi cases, respectively, and a 0.08\% incremental gain in the English case compared to the legacy implementation; while in search Ads matching led to a 0.36% increase in Ads revenue in the Japanese case.
翻译:客户行为数据对电子商务搜索系统具有显著影响。然而,对于低频查询,相关行为数据往往稀疏且带有噪声,无法为搜索机制提供充分支持。为应对这一挑战,查询重写概念应运而生——该概念表明,低频查询可借鉴语义相近的高频查询的行为模式。在亚马逊产品搜索中,查询重写已展现出提升搜索相关性和整体营收的有效性。但将该方法应用于低流量区域及复杂多语言环境中的中小型新兴企业时,其在可扩展性与可扩展性方面面临挑战。本研究聚焦于构建一种查询重写解决方案,使其即便在训练数据质量与规模受限、语言特征相对复杂的情况下仍能有效运作。本文概述了在亚马逊产品搜索基础设施中部署的解决方案,涵盖优化数据挖掘流程、重新定义模型训练目标及重塑训练策略等多个环节。通过搜索排序与广告匹配的在线A/B测试验证了所提方案的有效性。值得注意的是,与原有实现相比,该方案在搜索排序场景中使日语与印地语的整体营收分别提升0.14%和0.29%,英语场景提升0.08%;在搜索广告匹配场景中,日语场景的广告营收提升0.36%。