Recent works have revealed an essential paradigm in designing loss functions that differentiate individual losses vs. aggregate losses. The individual loss measures the quality of the model on a sample, while the aggregate loss combines individual losses/scores over each training sample. Both have a common procedure that aggregates a set of individual values to a single numerical value. The ranking order reflects the most fundamental relation among individual values in designing losses. In addition, decomposability, in which a loss can be decomposed into an ensemble of individual terms, becomes a significant property of organizing losses/scores. This survey provides a systematic and comprehensive review of rank-based decomposable losses in machine learning. Specifically, we provide a new taxonomy of loss functions that follows the perspectives of aggregate loss and individual loss. We identify the aggregator to form such losses, which are examples of set functions. We organize the rank-based decomposable losses into eight categories. Following these categories, we review the literature on rank-based aggregate losses and rank-based individual losses. We describe general formulas for these losses and connect them with existing research topics. We also suggest future research directions spanning unexplored, remaining, and emerging issues in rank-based decomposable losses.
翻译:近期研究揭示了一种设计损失函数的重要范式,即区分个体损失与总体损失。个体损失衡量模型在单个样本上的表现质量,而总体损失则综合每个训练样本的个体损失/得分。两者都遵循一个共同过程:将一组个体值聚合为单一数值。排序顺序反映了设计损失函数时个体值间最基本的关联。此外,可分解性(即损失可被分解为个体项集合的性质)成为组织损失/得分的重要属性。本综述对机器学习中基于排序的可分解损失函数进行了系统全面的回顾。具体而言,我们提出了一个遵循总体损失与个体损失视角的新损失函数分类体系,识别出构成此类损失的聚合器(属于集合函数的实例),并将基于排序的可分解损失函数划分为八种类别。依据这些类别,我们回顾了基于排序的总体损失与基于排序的个体损失的相关文献,描述了这些损失函数的通用公式,并将其与现有研究主题相关联。我们还针对基于排序的可分解损失函数中尚未探索、有待解决及新兴的研究方向提出了未来研究建议。