Graded labels are ubiquitous in real-world learning-to-rank applications, especially in human rated relevance data. Traditional learning-to-rank techniques aim to optimize the ranked order of documents. They typically, however, ignore predicting actual grades. This prevents them from being adopted in applications where grades matter, such as filtering out ``poor'' documents. Achieving both good ranking performance and good grade prediction performance is still an under-explored problem. Existing research either focuses only on ranking performance by not calibrating model outputs, or treats grades as numerical values, assuming labels are on a linear scale and failing to leverage the ordinal grade information. In this paper, we conduct a rigorous study of learning to rank with grades, where both ranking performance and grade prediction performance are important. We provide a formal discussion on how to perform ranking with non-scalar predictions for grades, and propose a multiobjective formulation to jointly optimize both ranking and grade predictions. In experiments, we verify on several public datasets that our methods are able to push the Pareto frontier of the tradeoff between ranking and grade prediction performance, showing the benefit of leveraging ordinal grade information.
翻译:等级标签在现实世界的排序学习应用中普遍存在,尤其是人工标注的相关性数据中。传统的排序学习技术旨在优化文档的排序顺序,但通常忽略了对实际等级的预测。这导致它们无法应用于需要考量等级的场景(例如过滤“劣质”文档)。同时实现良好的排序性能与等级预测性能仍是一个研究不足的问题。现有研究要么仅关注排序性能而不校准模型输出,要么将等级视为数值标签(假设标签呈线性尺度),未能利用序数等级信息。本文对带等级的排序学习进行了严谨研究,其中排序性能与等级预测性能同等重要。我们从理论上探讨了如何针对等级的非标量预测进行排序,并提出了一种多目标优化框架以联合优化排序与等级预测。实验表明,在多个公开数据集上,我们的方法能够推动排序性能与等级预测性能权衡的帕累托前沿,验证了利用序数等级信息的优势。