An effective ranking model usually requires a large amount of training data to learn the relevance between documents and queries. User clicks are often used as training data since they can indicate relevance and are cheap to collect, but they contain substantial bias and noise. There has been some work on mitigating various types of bias in simulated user clicks to train effective learning-to-rank models based on multiple features. However, how to effectively use such methods on large-scale pre-trained models with real-world click data is unknown. To alleviate the data bias in the real world, we incorporate heuristic-based features, refine the ranking objective, add random negatives, and calibrate the propensity calculation in the pre-training stage. Then we fine-tune several pre-trained models and train an ensemble model to aggregate all the predictions from various pre-trained models with human-annotation data in the fine-tuning stage. Our approaches won 3rd place in the "Pre-training for Web Search" task in WSDM Cup 2023 and are 22.6% better than the 4th-ranked team.
翻译:有效的排序模型通常需要大量训练数据来学习文档与查询之间的相关性。用户点击常被用作训练数据,因其能指示相关性且收集成本低廉,但这些数据包含显著偏差和噪声。已有研究通过模拟用户点击来缓解多种类型偏差,并基于多重特征训练有效的学习排序模型。然而,如何在大规模预训练模型上有效应用此类方法处理真实点击数据仍是未知。为缓解现实世界中的数据偏差,我们在预训练阶段引入启发式特征、优化排序目标、添加随机负样本并校准倾向性计算。随后,在微调阶段对多个预训练模型进行微调,并训练一个集成模型来聚合所有预训练模型对人工标注数据的预测结果。我们的方法在WSDM Cup 2023"网络搜索预训练"任务中获得第三名,且性能比第四名团队高出22.6%。