Conversational user queries are increasingly challenging traditional e-commerce platforms, whose search systems are typically optimized for keyword-based queries. We present an LLM-based semantic search framework that effectively captures user intent from conversational queries by combining domain-specific embeddings with structured filters. To address the challenge of limited labeled data, we generate synthetic data using LLMs to guide the fine-tuning of two models: an embedding model that positions semantically similar products close together in the representation space, and a generative model for converting natural language queries into structured constraints. By combining similarity-based retrieval with constraint-based filtering, our framework achieves strong precision and recall across various settings compared to baseline approaches on a real-world dataset.
翻译:对话式用户查询日益挑战传统电商平台,其搜索系统通常针对基于关键词的查询进行优化。本文提出一种基于大型语言模型的语义搜索框架,通过结合领域特定嵌入与结构化过滤器,有效捕捉对话式查询中的用户意图。为解决标注数据有限的挑战,我们利用大型语言模型生成合成数据以指导两个模型的微调:一个嵌入模型,用于在表示空间中使语义相似的商品彼此靠近;以及一个生成模型,用于将自然语言查询转换为结构化约束。通过将基于相似性的检索与基于约束的过滤相结合,我们的框架在真实数据集上相较于基线方法,在多种设定下均实现了较高的精确率与召回率。