The distillation of ranking models has become an important topic in both academia and industry. In recent years, several advanced methods have been proposed to tackle this problem, often leveraging ranking information from teacher rankers that is absent in traditional classification settings. To date, there is no well-established consensus on how to evaluate this class of models. Moreover, inconsistent benchmarking on a wide range of tasks and datasets make it difficult to assess or invigorate advances in this field. This paper first examines representative prior arts on ranking distillation, and raises three questions to be answered around methodology and reproducibility. To that end, we propose a systematic and unified benchmark, Ranking Distillation Suite (RD-Suite), which is a suite of tasks with 4 large real-world datasets, encompassing two major modalities (textual and numeric) and two applications (standard distillation and distillation transfer). RD-Suite consists of benchmark results that challenge some of the common wisdom in the field, and the release of datasets with teacher scores and evaluation scripts for future research. RD-Suite paves the way towards better understanding of ranking distillation, facilities more research in this direction, and presents new challenges.
翻译:排序模型的蒸馏已成为学术界和工业界的重要课题。近年来,针对该问题提出了多种先进方法,这些方法通常利用了传统分类场景中缺失的教师排序器提供的排序信息。迄今为止,对于如何评估此类模型尚未形成广泛共识。此外,在不同任务和数据集上不一致的基准测试方式导致该领域的进展难以评估或推动。本文首先考察了排序蒸馏领域具有代表性的先前工作,并围绕方法论与可复现性提出了三个待解答问题。为此,我们提出一个系统化且统一的基准测试——排序蒸馏套件(RD-Suite),该套件包含4个大规模真实世界数据集的任务集合,涵盖两种主要模态(文本与数值)及两类应用(标准蒸馏与蒸馏迁移)。RD-Suite的基准测试结果挑战了该领域的部分常见观点,并发布了附有教师评分的数据集及评估脚本以供未来研究使用。RD-Suite为更深入理解排序蒸馏铺平了道路,促进了该方向的研究发展,并带来了新的挑战。