Optimizing industrial search ranking models solely for user engagement signals often introduces systematic biases, prioritizing popular or price-anchored items that may not satisfy semantic intent. We present a production-scale multi-task ranking system that integrates semantic relevance as a primary optimization objective, enabling explicit and controllable relevance-engagement trade-offs. Our architecture employs an ordinal relevance head that predicts cumulative probabilities over relevance thresholds, preserving the inherent ordering of labels. These outputs are integrated with engagement heads through a unified value model scoring function, enabling systematic balancing of semantic quality and short-term behavioral signals. To provide high-quality supervision for this multi-task framework, we utilize fine-tuned lightweight Large Language Models (LLMs) to generate three-level ordinal relevance labels: irrelevant, moderately relevant, and highly relevant. We address challenges regarding label distribution sensitivity and ensure high alignment with human annotations to enable efficient labeling for over 100 million query-item pairs. Evaluation across offline metrics, including NDCG@10, and online A/B experiments demonstrates that our approach significantly improves semantic alignment while preserving core engagement objectives.
翻译:工业搜索引擎排序模型若仅依赖用户参与度信号进行优化,往往会产生系统性偏差,优先呈现高人气或价格锚定商品,而可能无法满足语义意图。我们提出了一套生产级多任务排序系统,将语义相关性作为主要优化目标,实现显式可控的相关性与参与度权衡。该架构采用序数相关性预测头,通过累积概率预测相关性阈值,保留标签的固有顺序性。这些输出与参与度预测头通过统一的估值模型评分函数相融合,实现语义质量与短期行为信号之间的系统性平衡。为向该多任务框架提供高质量监督,我们利用微调后的轻量化大语言模型生成三级序数相关性标签:不相关、中度相关与高度相关。针对标签分布敏感性挑战,我们确保标签与人工标注高度对齐,从而实现超过1亿条查询-商品对的高效标注。基于NDCG@10等离线指标与在线A/B实验的评估表明,我们的方法在保持核心参与度目标的同时,显著提升了语义对齐效果。