Thinking Broad, Acting Fast: Latent Reasoning Distillation from Multi-Perspective Chain-of-Thought for E-Commerce Relevance

Effective relevance modeling is crucial for e-commerce search, as it aligns search results with user intent and enhances customer experience. Recent work has leveraged large language models (LLMs) to address the limitations of traditional relevance models, especially for long-tail and ambiguous queries. By incorporating Chain-of-Thought (CoT) reasoning, these approaches improve both accuracy and interpretability through multi-step reasoning. However, two key limitations remain: (1) most existing approaches rely on single-perspective CoT reasoning, which fails to capture the multifaceted nature of e-commerce relevance (e.g., user intent vs. attribute-level matching vs. business-specific rules); and (2) although CoT-enhanced LLM's offer rich reasoning capabilities, their high inference latency necessitates knowledge distillation for real-time deployment, yet current distillation methods discard the CoT rationale structure at inference, using it as a transient auxiliary signal and forfeiting its reasoning utility. To address these challenges, we propose a novel framework that better exploits CoT semantics throughout the optimization pipeline. Specifically, the teacher model leverages Multi-Perspective CoT (MPCoT) to generate diverse rationales and combines Supervised Fine-Tuning (SFT) with Direct Preference Optimization (DPO) to construct a more robust reasoner. For distillation, we introduce Latent Reasoning Knowledge Distillation (LRKD), which endows a student model with a lightweight inference-time latent reasoning extractor, allowing efficient and low-latency internalization of the LLM's sophisticated reasoning capabilities. Evaluated in offline experiments and online A/B tests on an e-commerce search advertising platform serving tens of millions of users daily, our method delivers significant offline gains, showing clear benefits in both commercial performance and user experience.

翻译：有效的相关性建模对于电商搜索至关重要，它能将搜索结果与用户意图对齐并提升客户体验。近期研究利用大语言模型（LLM）来克服传统相关性模型的局限性，尤其针对长尾和模糊查询。通过引入思维链（CoT）推理，这些方法通过多步推理同时提升了准确性和可解释性。然而仍存在两个关键局限：（1）现有方法大多依赖单视角CoT推理，无法捕捉电商相关性的多维度特性（例如用户意图 vs. 属性级匹配 vs. 业务特定规则）；（2）尽管CoT增强的LLM具备丰富的推理能力，但其高推理延迟要求通过知识蒸馏实现实时部署，而当前蒸馏方法在推理时丢弃了CoT的推理结构，仅将其作为瞬态辅助信号，丧失了其推理效用。为应对这些挑战，我们提出一种在优化全流程中更好利用CoT语义的新框架。具体而言，教师模型采用多视角思维链（MPCoT）生成多样化推理依据，并结合监督微调（SFT）与直接偏好优化（DPO）构建更稳健的推理器。针对蒸馏过程，我们提出潜在推理知识蒸馏（LRKD），为学生模型配备轻量级推理时潜在推理提取器，从而高效低延迟地内化LLM的复杂推理能力。通过在日均服务数千万用户的电商搜索广告平台上进行的离线实验与在线A/B测试评估，我们的方法取得了显著的离线性能提升，并在商业表现和用户体验方面均展现出明确优势。