Query-service relevance prediction in e-commerce search systems faces strict latency requirements that prevent the direct application of Large Language Models (LLMs). To bridge this gap, we propose a two-stage reasoning distillation framework to transfer reasoning capabilities from a powerful teacher LLM to a lightweight, deployment-friendly student model. In the first stage, we address the limitations of general-purpose LLMs by constructing a domain-adapted teacher model. This is achieved through a three-step process: domain-adaptive pre-training to inject platform knowledge, supervised fine-tuning to elicit reasoning skills, and preference optimization with a multi-dimensional reward model to ensure the generation of reliable and preference-aligned reasoning paths. This teacher can then automatically annotate massive query-service pairs from search logs with both relevance labels and reasoning chains. In the second stage, to address the challenges of architectural heterogeneity in standard distillation, we introduce Contrastive Reasoning Self-Distillation (CRSD). By modeling the behavior of the same student model under "standard" and "reasoning-augmented" inputs as a teacher-student relationship, CRSD enables the lightweight model to internalize the teacher's complex decision-making mechanisms without needing the explicit reasoning path at inference. Offline evaluations and online A/B testing in the Meituan search advertising system demonstrate that our framework achieves significant improvements across multiple metrics, validating its effectiveness and practical value.
翻译:电子商务搜索系统中的查询-服务相关性预测面临着严格的延迟要求,这使得大型语言模型无法直接应用。为弥合这一差距,我们提出了一种两阶段推理蒸馏框架,将强大的教师大语言模型的推理能力迁移至轻量级、易于部署的学生模型。在第一阶段,我们通过构建领域适应的教师模型来克服通用大语言模型的局限性。这通过三步流程实现:领域自适应预训练以注入平台知识,监督微调以激发推理技能,以及结合多维奖励模型的偏好优化,确保生成可靠且符合偏好的推理路径。该教师模型随后可自动为搜索日志中的海量查询-服务对标注相关性标签及推理链。在第二阶段,针对标准蒸馏中架构异质性带来的挑战,我们引入了对比推理自蒸馏方法。该方法通过将同一学生模型在“标准”输入和“推理增强”输入下的行为建模为师生关系,使轻量级模型能够内化教师模型的复杂决策机制,而无需在推理时显式依赖推理路径。在美团搜索广告系统中的离线评估与在线A/B测试表明,我们的框架在多项指标上均取得显著提升,验证了其有效性与实用价值。