The Area Under the ROC Curve (AUC) is a widely employed metric in long-tailed classification scenarios. Nevertheless, most existing methods primarily assume that training and testing examples are drawn i.i.d. from the same distribution, which is often unachievable in practice. Distributionally Robust Optimization (DRO) enhances model performance by optimizing it for the local worst-case scenario, but directly integrating AUC optimization with DRO results in an intractable optimization problem. To tackle this challenge, methodically we propose an instance-wise surrogate loss of Distributionally Robust AUC (DRAUC) and build our optimization framework on top of it. Moreover, we highlight that conventional DRAUC may induce label bias, hence introducing distribution-aware DRAUC as a more suitable metric for robust AUC learning. Theoretically, we affirm that the generalization gap between the training loss and testing error diminishes if the training set is sufficiently large. Empirically, experiments on corrupted benchmark datasets demonstrate the effectiveness of our proposed method. Code is available at: https://github.com/EldercatSAM/DRAUC.
翻译:ROC曲线下面积(AUC)是长尾分类场景中广泛使用的指标。然而,现有方法大多假设训练和测试样本独立同分布(i.i.d.)于同一分布,这在实践中往往难以实现。分布鲁棒性优化(DRO)通过针对局部最坏情况优化模型来提升性能,但直接将AUC优化与DRO结合会导致一个难以处理的优化问题。为应对这一挑战,我们方法性地提出了一种分布鲁棒性AUC(DRAUC)的实例级替代损失函数,并在此基础上构建优化框架。此外,我们指出传统DRAUC可能引入标签偏差,因此引入分布感知型DRAUC作为鲁棒AUC学习更合适的度量。理论上,我们证实当训练集足够大时,训练损失与测试误差之间的泛化差距将缩小。实验结果表明,在受污染的基准数据集上,我们提出的方法具有有效性。代码地址:https://github.com/EldercatSAM/DRAUC。