Achievement. We introduce LORE, a systematic framework for Large Generative Model-based relevance in e-commerce search. Deployed and iterated over three years, LORE achieves a cumulative +27\% improvement in online GoodRate metrics. This report shares the valuable experience gained throughout its development lifecycle, spanning data, features, training, evaluation, and deployment. Insight. While existing works apply Chain-of-Thought (CoT) to enhance relevance, they often hit a performance ceiling. We argue this stems from treating relevance as a monolithic task, lacking principled deconstruction. Our key insight is that relevance comprises distinct capabilities: knowledge and reasoning, multi-modal matching, and rule adherence. We contend that a qualitative-driven decomposition is essential for breaking through current performance bottlenecks. Contributions. LORE provides a complete blueprint for the LLM relevance lifecycle. Key contributions include: (1) A two-stage training paradigm combining progressive CoT synthesis via SFT with human preference alignment via RL. (2) A comprehensive benchmark, RAIR, designed to evaluate these core capabilities. (3) A query frequency-stratified deployment strategy that efficiently transfers offline LLM capabilities to the online system. LORE serves as both a practical solution and a methodological reference for other vertical domains.
翻译:成就。我们提出了LORE,一个基于大规模生成模型的电商搜索相关性系统框架。经过三年部署与迭代,LORE在线上GoodRate指标上累计实现了+27%的提升。本报告分享了其从数据、特征、训练、评估到部署的完整开发生命周期中所积累的宝贵经验。洞察。现有研究虽应用思维链(CoT)提升相关性,但常遭遇性能瓶颈。我们认为其根源在于将相关性视为单一任务,缺乏系统性的解构分析。我们的核心洞见是:相关性由知识推理、多模态匹配与规则遵循这三项独立能力构成。我们主张,以质化为导向的能力解构是突破当前性能瓶颈的关键。贡献。LORE为LLM相关性应用提供了完整的技术蓝图。主要贡献包括:(1)结合SFT渐进式CoT合成与基于RL的人类偏好对齐的两阶段训练范式;(2)专为评估上述核心能力设计的综合基准RAIR;(3)基于查询频率分层的部署策略,可高效将离线LLM能力迁移至在线系统。LORE不仅是一个实用解决方案,也为其他垂直领域提供了方法论参考。