This paper proposes a novel method to improve the accuracy of product search in e-commerce by utilizing a cluster language model. The method aims to address the limitations of the bi-encoder architecture while maintaining a minimal additional training burden. The approach involves labeling top products for each query, generating semantically similar query clusters using the K-Means clustering algorithm, and fine-tuning a global language model into cluster language models on individual clusters. The parameters of each cluster language model are fine-tuned to learn local manifolds in the feature space efficiently, capturing the nuances of various query types within each cluster. The inference is performed by assigning a new query to its respective cluster and utilizing the corresponding cluster language model for retrieval. The proposed method results in more accurate and personalized retrieval results, offering a superior alternative to the popular bi-encoder based retrieval models in semantic search.
翻译:本文提出一种利用集群语言模型改进电商产品搜索准确性的新方法。该方法旨在克服双编码器架构的局限性,同时保持极低的额外训练负担。具体而言,通过为每个查询标注最相关产品,利用K-Means聚类算法生成语义相似的查询集群,并在各集群上对全局语言模型进行微调,得到集群专属语言模型。每个集群语言模型的参数经过微调后,能高效学习特征空间中的局部流形,捕获集群内各类查询的细微差异。推理阶段,新查询被分配至对应集群,并使用该集群的语言模型进行检索。实验表明,所提方法能生成更精准、更具个性化的检索结果,为语义搜索中广泛使用的基于双编码器的检索模型提供了更优替代方案。