Delivering meaningful uncertainty estimates is essential for a successful deployment of machine learning models in the clinical practice. A central aspect of uncertainty quantification is the ability of a model to return predictions that are well-aligned with the actual probability of the model being correct, also known as model calibration. Although many methods have been proposed to improve calibration, no technique can match the simple, but expensive approach of training an ensemble of deep neural networks. In this paper we introduce a form of simplified ensembling that bypasses the costly training and inference of deep ensembles, yet it keeps its calibration capabilities. The idea is to replace the common linear classifier at the end of a network by a set of heads that are supervised with different loss functions to enforce diversity on their predictions. Specifically, each head is trained to minimize a weighted Cross-Entropy loss, but the weights are different among the different branches. We show that the resulting averaged predictions can achieve excellent calibration without sacrificing accuracy in two challenging datasets for histopathological and endoscopic image classification. Our experiments indicate that Multi-Head Multi-Loss classifiers are inherently well-calibrated, outperforming other recent calibration techniques and even challenging Deep Ensembles' performance. Code to reproduce our experiments can be found at \url{https://github.com/agaldran/mhml_calibration} .
翻译:在临床实践中,为机器学习模型提供有意义的预测不确定性估计是成功部署的关键。不确定性量化的核心在于模型输出的预测概率与实际正确概率之间的良好匹配,即模型校准。尽管已有多种方法被提出用于改善校准效果,但目前尚无技术能媲美训练深度神经网络集成这一简单却代价高昂的方法。本文提出一种简化集成形式,既避免了深度集成的昂贵训练与推理成本,又保留了其校准能力。其核心思想是将网络末端的常规线性分类器替换为一组多头结构,并通过不同损失函数监督各头训练以强制预测结果的多样性。具体而言,每个头在训练中均最小化加权交叉熵损失,但不同分支的权重存在差异。实验表明,在组织病理学与内窥镜图像分类两个具有挑战性的数据集上,该方法的平均预测结果能在不牺牲准确率的前提下实现卓越的校准性能。我们的结果显示,多头多损失分类器天然具有良好校准特性,且优于其他近期校准技术,甚至能与深度集成方法相抗衡。实验复现代码可在 \url{https://github.com/agaldran/mhml_calibration} 获取。