Although sparse training has been successfully used in various resource-limited deep learning tasks to save memory, accelerate training, and reduce inference time, the reliability of the produced sparse models remains unexplored. Previous research has shown that deep neural networks tend to be over-confident, and we find that sparse training exacerbates this problem. Therefore, calibrating the sparse models is crucial for reliable prediction and decision-making. In this paper, we propose a new sparse training method to produce sparse models with improved confidence calibration. In contrast to previous research that uses only one mask to control the sparse topology, our method utilizes two masks, including a deterministic mask and a random mask. The former efficiently searches and activates important weights by exploiting the magnitude of weights and gradients. While the latter brings better exploration and finds more appropriate weight values by random updates. Theoretically, we prove our method can be viewed as a hierarchical variational approximation of a probabilistic deep Gaussian process. Extensive experiments on multiple datasets, model architectures, and sparsities show that our method reduces ECE values by up to 47.8\% and simultaneously maintains or even improves accuracy with only a slight increase in computation and storage burden.
翻译:尽管稀疏训练已成功应用于各类资源受限的深度学习任务中,以节省内存、加速训练并降低推理时间,但所生成稀疏模型的可靠性仍未得到探索。先前研究表明,深度神经网络往往过度自信,而我们发现稀疏训练会加剧这一问题。因此,校准稀疏模型对于可靠的预测与决策至关重要。本文提出一种新的稀疏训练方法,可生成具有改进置信度校准的稀疏模型。与以往仅使用单一掩码控制稀疏拓扑的研究不同,我们的方法采用双重掩码,包括确定性掩码与随机掩码。前者通过利用权重和梯度的幅值高效搜索并激活重要权重;后者则通过随机更新实现更优探索,从而发现更合适的权重值。理论上,我们证明该方法可视为概率深度高斯过程的分层变分近似。在多个数据集、模型架构及稀疏度上的大量实验表明,本方法将ECE值最高降低47.8%,同时在仅略微增加计算与存储负担的情况下,保持甚至提升准确性。