This paper describes the approach of the THUIR team at the WSDM Cup 2023 Pre-training for Web Search task. This task requires the participant to rank the relevant documents for each query. We propose a new data pre-processing method and conduct pre-training and fine-tuning with the processed data. Moreover, we extract statistical, axiomatic, and semantic features to enhance the ranking performance. After the feature extraction, diverse learning-to-rank models are employed to merge those features. The experimental results show the superiority of our proposal. We finally achieve second place in this competition.
翻译:本文描述了THUIR团队在WSDM Cup 2023“网络搜索预训练”任务中的方法。该任务要求参与者为每个查询对相关文档进行排序。我们提出了一种新的数据预处理方法,并利用处理后的数据开展预训练与微调。此外,我们提取了统计特征、公理特征和语义特征以提升排序性能。在特征提取之后,采用多种排序学习模型对这些特征进行融合。实验结果表明了我们方法的优越性。最终,我们在本次竞赛中获得第二名。