Generative retrieval introduces a groundbreaking paradigm to document retrieval by directly generating the identifier of a pertinent document in response to a specific query. This paradigm has demonstrated considerable benefits and potential, particularly in representation and generalization capabilities, within the context of large language models. However, it faces significant challenges in E-commerce search scenarios, including the complexity of generating detailed item titles from brief queries, the presence of noise in item titles with weak language order, issues with long-tail queries, and the interpretability of results. To address these challenges, we have developed an innovative framework for E-commerce search, called generative retrieval with preference optimization. This framework is designed to effectively learn and align an autoregressive model with target data, subsequently generating the final item through constraint-based beam search. By employing multi-span identifiers to represent raw item titles and transforming the task of generating titles from queries into the task of generating multi-span identifiers from queries, we aim to simplify the generation process. The framework further aligns with human preferences using click data and employs a constrained search method to identify key spans for retrieving the final item, thereby enhancing result interpretability. Our extensive experiments show that this framework achieves competitive performance on a real-world dataset, and online A/B tests demonstrate the superiority and effectiveness in improving conversion gains.
翻译:生成式检索通过直接生成与特定查询相关的文档标识符,为文档检索引入了一种突破性范式。该范式在大型语言模型的背景下,尤其在表示能力和泛化能力方面,已展现出显著优势与潜力。然而,在电商搜索场景中,它面临着重大挑战,包括从简短查询生成详细商品标题的复杂性、商品标题存在语言顺序弱且含噪声的问题、长尾查询的处理困难以及结果的可解释性问题。为应对这些挑战,我们开发了一种创新的电商搜索框架,称为基于偏好优化的生成式检索。该框架旨在有效地学习自回归模型并将其与目标数据对齐,随后通过基于约束的束搜索生成最终商品。通过采用多片段标识符来表示原始商品标题,并将从查询生成标题的任务转化为从查询生成多片段标识符的任务,我们力求简化生成过程。该框架进一步利用点击数据与人类偏好对齐,并采用约束搜索方法识别关键片段以检索最终商品,从而提升结果的可解释性。我们的大量实验表明,该框架在真实数据集上取得了具有竞争力的性能,在线A/B测试也证明了其在提升转化收益方面的优越性和有效性。