Ensemble Ranking Model with Multiple Pretraining Strategies for Web Search

An effective ranking model usually requires a large amount of training data to learn the relevance between documents and queries. User clicks are often used as training data since they can indicate relevance and are cheap to collect, but they contain substantial bias and noise. There has been some work on mitigating various types of bias in simulated user clicks to train effective learning-to-rank models based on multiple features. However, how to effectively use such methods on large-scale pre-trained models with real-world click data is unknown. To alleviate the data bias in the real world, we incorporate heuristic-based features, refine the ranking objective, add random negatives, and calibrate the propensity calculation in the pre-training stage. Then we fine-tune several pre-trained models and train an ensemble model to aggregate all the predictions from various pre-trained models with human-annotation data in the fine-tuning stage. Our approaches won 3rd place in the "Pre-training for Web Search" task in WSDM Cup 2023 and are 22.6% better than the 4th-ranked team.

翻译：有效的排序模型通常需要大量训练数据来学习文档与查询之间的相关性。用户点击常被用作训练数据，因其能指示相关性且收集成本低廉，但这些数据包含显著偏差和噪声。已有研究通过模拟用户点击来缓解多种类型偏差，并基于多重特征训练有效的学习排序模型。然而，如何在大规模预训练模型上有效应用此类方法处理真实点击数据仍是未知。为缓解现实世界中的数据偏差，我们在预训练阶段引入启发式特征、优化排序目标、添加随机负样本并校准倾向性计算。随后，在微调阶段对多个预训练模型进行微调，并训练一个集成模型来聚合所有预训练模型对人工标注数据的预测结果。我们的方法在WSDM Cup 2023"网络搜索预训练"任务中获得第三名，且性能比第四名团队高出22.6%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

NeurIPS 2021 | 寻MixTraining: 一种全新的物体检测训练范式

专知会员服务

12+阅读 · 2021年12月9日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

【Google 大脑】使用上千个优化任务学习超参数搜索策略，Using a thousand optimization tasks to learn hyperparameter search strategies

专知会员服务

18+阅读 · 2020年3月14日