Graded labels are ubiquitous in real-world learning-to-rank applications, especially in human rated relevance data. Traditional learning-to-rank techniques aim to optimize the ranked order of documents. They typically, however, ignore predicting actual grades. This prevents them from being adopted in applications where grades matter, such as filtering out ``poor'' documents. Achieving both good ranking performance and good grade prediction performance is still an under-explored problem. Existing research either focuses only on ranking performance by not calibrating model outputs, or treats grades as numerical values, assuming labels are on a linear scale and failing to leverage the ordinal grade information. In this paper, we conduct a rigorous study of learning to rank with grades, where both ranking performance and grade prediction performance are important. We provide a formal discussion on how to perform ranking with non-scalar predictions for grades, and propose a multiobjective formulation to jointly optimize both ranking and grade predictions. In experiments, we verify on several public datasets that our methods are able to push the Pareto frontier of the tradeoff between ranking and grade prediction performance, showing the benefit of leveraging ordinal grade information.
翻译:分级标签在现实世界的排序学习应用中普遍存在,特别是在人工标注的相关性数据中。传统的排序学习技术旨在优化文档的排序顺序,但通常忽略了对实际等级的预测。这阻碍了它们在需考虑等级的应用(如过滤"低质量"文档)中的采用。同时实现良好的排序性能和等级预测性能仍是一个未充分探索的问题。现有研究要么仅关注排序性能而不校准模型输出,要么将等级视为数值型标签——假设标签呈线性尺度,从而未能利用序数等级信息。本文对同时需要排序性能与等级预测性能的分级排序学习问题进行了严谨研究,系统论述了如何基于非标量等级预测进行排序,并提出了一种多目标优化框架以联合优化排序与等级预测。在多个公开数据集上的实验表明,我们的方法能够推动排序与等级预测性能权衡的帕累托前沿,证明了利用序数等级信息的优势。