Many evaluation metrics can be used to assess the performance of models in binary classification tasks. However, most of them are derived from a confusion matrix in a non-differentiable form, making it very difficult to generate a differentiable loss function that could directly optimize them. The lack of solutions to bridge this challenge not only hinders our ability to solve difficult tasks, such as imbalanced learning, but also requires the deployment of computationally expensive hyperparameter search processes in model selection. In this paper, we propose a general-purpose approach that transforms any confusion matrix-based metric into a loss function, \textit{AnyLoss}, that is available in optimization processes. To this end, we use an approximation function to make a confusion matrix represented in a differentiable form, and this approach enables any confusion matrix-based metric to be directly used as a loss function. The mechanism of the approximation function is provided to ensure its operability and the differentiability of our loss functions is proved by suggesting their derivatives. We conduct extensive experiments under diverse neural networks with many datasets, and we demonstrate their general availability to target any confusion matrix-based metrics. Our method, especially, shows outstanding achievements in dealing with imbalanced datasets, and its competitive learning speed, compared to multiple baseline models, underscores its efficiency.
翻译:在二分类任务中,许多评估指标可用于衡量模型性能。然而,大多数指标源自不可微形式的混淆矩阵,导致难以构建可直接优化这些指标的可微损失函数。缺乏解决这一挑战的方法不仅限制了我们在不平衡学习等困难任务上的能力,还迫使模型选择过程必须部署计算成本高昂的超参数搜索流程。本文提出一种通用方法,可将任何基于混淆矩阵的指标转化为优化过程中可用的损失函数——\textit{AnyLoss}。为此,我们采用近似函数将混淆矩阵表示为可微形式,从而使任何基于混淆矩阵的指标都能直接作为损失函数使用。我们提供了近似函数的运作机制以确保其可操作性,并通过推导导数证明了所提损失函数的可微性。我们在多种神经网络架构和数据集上进行了广泛实验,验证了该方法对任意基于混淆矩阵指标的普适性。特别地,我们的方法在处理不平衡数据集时表现出卓越性能,其与多个基线模型相比具有竞争力的学习速度,进一步印证了其高效性。