Supervised training of deep neural networks for classification typically relies on hard targets, which promote overconfidence and can limit calibration, generalization, and robustness. Self-distillation methods aim to mitigate this by leveraging inter-class and sample-specific information present in the model's own predictions, but often remain dependent on hard targets without explicitly modeling predictive uncertainty. With this in mind, we propose Deep Probabilistic Supervision (DPS), a principled learning framework constructing sample-specific target distributions via statistical inference on the model's own predictions, remaining independent of hard targets after initialization. We show that DPS consistently yields higher test accuracy (e.g., +2.0% for DenseNet-264 on ImageNet) and significantly lower Expected Calibration Error (ECE) (-40% ResNet-50, CIFAR-100) than existing self-distillation methods. When combined with a contrastive loss, DPS achieves state-of-the-art robustness under label noise.
翻译:用于分类的深度神经网络监督训练通常依赖于硬目标,这会加剧模型过度自信并可能限制其校准能力、泛化性能与鲁棒性。自蒸馏方法旨在通过利用模型自身预测中存在的类间与样本特异性信息来缓解此问题,但这类方法往往仍依赖硬目标且未显式建模预测不确定性。基于此,我们提出深度概率监督——一种通过模型自身预测的统计推断构建样本特异性目标分布的原则性学习框架,该框架在初始化后完全独立于硬目标。实验表明,相较于现有自蒸馏方法,DPS在测试准确率上持续取得更高提升(例如DenseNet-264在ImageNet上提升+2.0%),并显著降低预期校准误差(如ResNet-50在CIFAR-100上降低40%)。当与对比损失结合时,DPS在标签噪声条件下实现了最先进的鲁棒性。