Search query variation poses a challenge in e-commerce search, as equivalent search intents can be expressed through different queries with surface-level differences. This paper introduces a framework to recognize and leverage query equivalence to enhance searcher and business outcomes. The proposed approach addresses three key problems: mapping queries to vector representations of search intent, identifying nearest neighbor queries expressing equivalent or similar intent, and optimizing for user or business objectives. The framework utilizes both surface similarity and behavioral similarity to determine query equivalence. Surface similarity involves canonicalizing queries based on word inflection, word order, compounding, and noise words. Behavioral similarity leverages historical search behavior to generate vector representations of query intent. An offline process is used to train a sentence similarity model, while an online nearest neighbor approach supports processing of unseen queries. Experimental evaluations demonstrate the effectiveness of the proposed approach, outperforming popular sentence transformer models and achieving a Pearson correlation of 0.85 for query similarity. The results highlight the potential of leveraging historical behavior data and training models to recognize and utilize query equivalence in e-commerce search, leading to improved user experiences and business outcomes. Further advancements and benchmark datasets are encouraged to facilitate the development of solutions for this critical problem in the e-commerce domain.
翻译:搜索查询的变体是电子商务搜索中的一个挑战,因为具有相同搜索意图的查询可能因表面差异而表达不同。本文提出一个框架来识别并利用查询等价性,以提升用户和业务成果。该方案解决三个关键问题:将查询映射为搜索意图的向量表示、识别表达等价或相似意图的邻近查询、以及优化用户或业务目标。该框架同时利用表面相似性和行为相似性来确定查询等价性。表面相似性涉及基于词形变化、词序、复合词和噪声词对查询进行规范化处理。行为相似性则通过历史搜索行为生成查询意图的向量表示。采用离线流程训练句子相似度模型,同时通过在线邻近邻方法支持对未见过查询的处理。实验评估证明了该方法的有效性,其性能优于流行的句子变换器模型,查询相似度的皮尔逊相关系数达到0.85。结果表明,利用历史行为数据并训练模型以识别和利用查询等价性,能够改善用户体验和业务成果。我们鼓励进一步开发基准数据集并推动该领域解决方案的进展,以应对电子商务中的这一关键问题。