AnyLoss: Transforming Classification Metrics into Loss Functions

Many evaluation metrics can be used to assess the performance of models in binary classification tasks. However, most of them are derived from a confusion matrix in a non-differentiable form, making it very difficult to generate a differentiable loss function that could directly optimize them. The lack of solutions to bridge this challenge not only hinders our ability to solve difficult tasks, such as imbalanced learning, but also requires the deployment of computationally expensive hyperparameter search processes in model selection. In this paper, we propose a general-purpose approach that transforms any confusion matrix-based metric into a loss function, \textit{AnyLoss}, that is available in optimization processes. To this end, we use an approximation function to make a confusion matrix represented in a differentiable form, and this approach enables any confusion matrix-based metric to be directly used as a loss function. The mechanism of the approximation function is provided to ensure its operability and the differentiability of our loss functions is proved by suggesting their derivatives. We conduct extensive experiments under diverse neural networks with many datasets, and we demonstrate their general availability to target any confusion matrix-based metrics. Our method, especially, shows outstanding achievements in dealing with imbalanced datasets, and its competitive learning speed, compared to multiple baseline models, underscores its efficiency.

翻译：在二分类任务中，许多评估指标可用于衡量模型性能。然而，大多数指标源自不可微形式的混淆矩阵，导致难以构建可直接优化这些指标的可微损失函数。缺乏解决这一挑战的方法不仅限制了我们在不平衡学习等困难任务上的能力，还迫使模型选择过程必须部署计算成本高昂的超参数搜索流程。本文提出一种通用方法，可将任何基于混淆矩阵的指标转化为优化过程中可用的损失函数——\textit{AnyLoss}。为此，我们采用近似函数将混淆矩阵表示为可微形式，从而使任何基于混淆矩阵的指标都能直接作为损失函数使用。我们提供了近似函数的运作机制以确保其可操作性，并通过推导导数证明了所提损失函数的可微性。我们在多种神经网络架构和数据集上进行了广泛实验，验证了该方法对任意基于混淆矩阵指标的普适性。特别地，我们的方法在处理不平衡数据集时表现出卓越性能，其与多个基线模型相比具有竞争力的学习速度，进一步印证了其高效性。

相关内容

损失函数（机器学习）

关注 10

损失函数，在AI中亦称呼距离函数，度量函数。此处的距离代表的是抽象性的，代表真实数据与预测数据之间的误差。损失函数（loss function）是用来估量你模型的预测值f(x)与真实值Y的不一致程度，它是一个非负实值函数,通常使用L(Y, f(x))来表示，损失函数越小，模型的鲁棒性就越好。损失函数是经验风险函数的核心部分，也是结构风险函数重要组成部分。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日