Low-precision training has emerged as a promising low-cost technique to enhance the training efficiency of deep neural networks without sacrificing much accuracy. Its Bayesian counterpart can further provide uncertainty quantification and improved generalization accuracy. This paper investigates low-precision sampling via Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) with low-precision and full-precision gradient accumulators for both strongly log-concave and non-log-concave distributions. Theoretically, our results show that, to achieve $\epsilon$-error in the 2-Wasserstein distance for non-log-concave distributions, low-precision SGHMC achieves quadratic improvement ($\widetilde{\mathbf{O}}\left({\epsilon^{-2}{\mu^*}^{-2}\log^2\left({\epsilon^{-1}}\right)}\right)$) compared to the state-of-the-art low-precision sampler, Stochastic Gradient Langevin Dynamics (SGLD) ($\widetilde{\mathbf{O}}\left({{\epsilon}^{-4}{\lambda^{*}}^{-1}\log^5\left({\epsilon^{-1}}\right)}\right)$). Moreover, we prove that low-precision SGHMC is more robust to the quantization error compared to low-precision SGLD due to the robustness of the momentum-based update w.r.t. gradient noise. Empirically, we conduct experiments on synthetic data, and {MNIST, CIFAR-10 \& CIFAR-100} datasets, which validate our theoretical findings. Our study highlights the potential of low-precision SGHMC as an efficient and accurate sampling method for large-scale and resource-limited machine learning.
翻译:低精度训练已成为一种有前景的低成本技术,可在不显著牺牲准确性的前提下提升深度神经网络的训练效率。其贝叶斯对应方法能进一步提供不确定性量化并改善泛化精度。本文研究了基于随机梯度哈密顿蒙特卡洛(SGHMC)的低精度采样方法,该方法在强对数凹分布和非对数凹分布中均采用了低精度和全精度的梯度累加器。理论上,我们的结果表明,对于非对数凹分布,为在2-Wasserstein距离上达到$\epsilon$误差,低精度SGHMC实现了相较于最先进的低精度采样器——随机梯度朗之万动力学(SGLD)($\widetilde{\mathbf{O}}\left({{\epsilon}^{-4}{\lambda^{*}}^{-1}\log^5\left({\epsilon^{-1}}\right)}\right)$)的二次改进($\widetilde{\mathbf{O}}\left({\epsilon^{-2}{\mu^*}^{-2}\log^2\left({\epsilon^{-1}}\right)}\right)$)。此外,我们证明,由于基于动量的更新对梯度噪声具有鲁棒性,低精度SGHMC比低精度SGLD对量化误差更为稳健。实证方面,我们在合成数据以及{MNIST、CIFAR-10和CIFAR-100}数据集上进行了实验,验证了我们的理论发现。我们的研究凸显了低精度SGHMC作为一种高效且准确的采样方法,在大规模及资源受限的机器学习中的潜力。