Areas under ROC (AUROC) and precision-recall curves (AUPRC) are common metrics for evaluating classification performance for imbalanced problems. Compared with AUROC, AUPRC is a more appropriate metric for highly imbalanced datasets. While stochastic optimization of AUROC has been studied extensively, principled stochastic optimization of AUPRC has been rarely explored. In this work, we propose a principled technical method to optimize AUPRC for deep learning. Our approach is based on maximizing the averaged precision (AP), which is an unbiased point estimator of AUPRC. We cast the objective into a sum of {\it dependent compositional functions} with inner functions dependent on random variables of the outer level. We propose efficient adaptive and non-adaptive stochastic algorithms named SOAP with {\it provable convergence guarantee under mild conditions} by leveraging recent advances in stochastic compositional optimization. Extensive experimental results on image and graph datasets demonstrate that our proposed method outperforms prior methods on imbalanced problems in terms of AUPRC. To the best of our knowledge, our work represents the first attempt to optimize AUPRC with provable convergence. The SOAP has been implemented in the libAUC library at~\url{https://libauc.org/}.
翻译:ROC曲线下面积(AUROC)和精度-召回率曲线下面积(AUPRC)是评估不平衡分类问题的常用指标。相比AUROC,AUPRC更适合高度不平衡数据集。尽管AUROC的随机优化已被广泛研究,但AUPRC的严格随机优化却鲜有探索。本文提出一种严格的技术方法以优化深度学习中的AUPRC。该方法基于最大化平均精度(AP)——AUPRC的无偏点估计。我们将目标函数表示为依赖于外层随机变量的内函数的**依赖型复合函数**之和。通过借鉴随机复合优化的最新进展,我们提出名为SOAP的高效自适应与非自适应随机算法,该算法在**温和条件下具有可证明的收敛性保证**。在图像与图数据集上的大量实验结果表明,所提方法在不平衡问题的AUPRC指标上优于现有方法。据我们所知,本文首次实现了具有收敛性保证的AUPRC优化。SOAP已集成至libAUC库(地址:\url{https://libauc.org/})。