The distillation of ranking models has become an important topic in both academia and industry. In recent years, several advanced methods have been proposed to tackle this problem, often leveraging ranking information from teacher rankers that is absent in traditional classification settings. To date, there is no well-established consensus on how to evaluate this class of models. Moreover, inconsistent benchmarking on a wide range of tasks and datasets make it difficult to assess or invigorate advances in this field. This paper first examines representative prior arts on ranking distillation, and raises three questions to be answered around methodology and reproducibility. To that end, we propose a systematic and unified benchmark, Ranking Distillation Suite (RD-Suite), which is a suite of tasks with 4 large real-world datasets, encompassing two major modalities (textual and numeric) and two applications (standard distillation and distillation transfer). RD-Suite consists of benchmark results that challenge some of the common wisdom in the field, and the release of datasets with teacher scores and evaluation scripts for future research. RD-Suite paves the way towards better understanding of ranking distillation, facilities more research in this direction, and presents new challenges.
翻译:排序模型的蒸馏已成为学术界和工业界的重要议题。近年来,研究者提出了多种先进方法来解决该问题,这些方法通常利用传统分类场景中缺失的排序教师模型所提供的排序信息。目前,关于如何评估此类模型尚未形成广泛共识。此外,不同任务和数据集上缺乏一致性的基准测试,使得评估或推动该领域的进展变得困难。本文首先梳理了排序蒸馏领域的代表性前期工作,并围绕方法论与可复现性提出了三个待解答的问题。为此,我们提出了一套系统化、统一化的基准测试——排序蒸馏套件(RD-Suite),该套件包含4个大规模真实世界数据集的任务集合,涵盖两种主要模态(文本与数值)及两类应用场景(标准蒸馏与蒸馏迁移)。RD-Suite中的基准测试结果挑战了该领域某些普遍认知,并提供了附带教师模型评分的数据集及评估脚本以供后续研究。RD-Suite为深入理解排序蒸馏奠定了基础,促进了该方向的研究发展,并提出了新的挑战。