The Chinese academy of sciences Information Retrieval team (CIR) has participated in the NTCIR-17 ULTRE-2 task. This paper describes our approaches and reports our results on the ULTRE-2 task. We recognize the issue of false negatives in the Baidu search data in this competition is very severe, much more severe than position bias. Hence, we adopt the Dual Learning Algorithm (DLA) to address the position bias and use it as an auxiliary model to study how to alleviate the false negative issue. We approach the problem from two perspectives: 1) correcting the labels for non-clicked items by a relevance judgment model trained from DLA, and learn a new ranker that is initialized from DLA; 2) including random documents as true negatives and documents that have partial matching as hard negatives. Both methods can enhance the model performance and our best method has achieved nDCG@10 of 0.5355, which is 2.66% better than the best score from the organizer.
翻译:中国科学院信息检索团队(CIR)参与了NTCIR-17 ULTRE-2任务。本文描述了我们在ULTRE-2任务中采用的方法并报告了实验结果。我们发现该竞赛中百度搜索数据的假阴性问题非常严重,其影响远超过位置偏差。为此,我们采用双学习算法(Dual Learning Algorithm, DLA)处理位置偏差,并将其作为辅助模型研究如何缓解假阴性问题。我们从两个角度切入:1)通过DLA训练的相关性判断模型修正未点击项的标签,并学习一个以DLA初始化的新排序器;2)将随机文档作为真负例,将部分匹配的文档作为难负例。两种方法均能提升模型性能,最优方法实现了nDCG@10为0.5355,比组织方提供的最佳得分高出2.66%。