Receiver Operating Characteristic (ROC) curves are plots of true positive rate versus false positive rate which are used to evaluate binary classification algorithms. Because the Area Under the Curve (AUC) is a constant function of the predicted values, learning algorithms instead optimize convex relaxations which involve a sum over all pairs of labeled positive and negative examples. Naive learning algorithms compute the gradient in quadratic time, which is too slow for learning using large batch sizes. We propose a new functional representation of the square loss and squared hinge loss, which results in algorithms that compute the gradient in either linear or log-linear time, and makes it possible to use gradient descent learning with large batch sizes. In our empirical study of supervised binary classification problems, we show that our new algorithm can achieve higher test AUC values on imbalanced data sets than previous algorithms, and make use of larger batch sizes than were previously feasible.
翻译:接收者操作特征曲线(ROC)是通过绘制真正例率与假正例率之间的关系来评估二分类算法的工具。由于曲线下面积(AUC)是预测值的常数函数,学习算法转而优化涉及所有正负标记样本对的凸松弛目标。朴素学习算法需在二次时间内计算梯度,这导致其在大批量学习场景中速度过慢。我们提出了一种关于平方损失与平方铰链损失的新型函数表示,据此设计的算法可在线性或对数线性时间内完成梯度计算,使得大批量梯度下降学习成为可能。在监督二分类问题的实证研究中,我们发现:相比现有算法,新算法在非平衡数据集上可实现更高的测试AUC值,并能使用此前不可行的大批量进行训练。