Artificial neural networks (ANNs) are powerful machine learning methods used in many modern applications such as facial recognition, machine translation, and cancer diagnostics. A common issue with ANNs is that they usually have millions or billions of trainable parameters, and therefore tend to overfit to the training data. This is especially problematic in applications where it is important to have reliable uncertainty estimates. Bayesian neural networks (BNN) can improve on this, since they incorporate parameter uncertainty. In addition, latent binary Bayesian neural networks (LBBNN) also take into account structural uncertainty by allowing the weights to be turned on or off, enabling inference in the joint space of weights and structures. In this paper, we will consider two extensions to the LBBNN method: Firstly, by using the local reparametrization trick (LRT) to sample the hidden units directly, we get a more computationally efficient algorithm. More importantly, by using normalizing flows on the variational posterior distribution of the LBBNN parameters, the network learns a more flexible variational posterior distribution than the mean field Gaussian. Experimental results show that this improves predictive power compared to the LBBNN method, while also obtaining more sparse networks. We perform two simulation studies. In the first study, we consider variable selection in a logistic regression setting, where the more flexible variational distribution leads to improved results. In the second study, we compare predictive uncertainty based on data generated from two-dimensional Gaussian distributions. Here, we argue that our Bayesian methods lead to more realistic estimates of predictive uncertainty.
翻译:人工神经网络(ANNs)是现代机器学习中的强大工具,广泛应用于人脸识别、机器翻译和癌症诊断等领域。然而,这类网络通常包含数百万甚至数十亿可训练参数,容易导致过拟合训练数据,这在需要可靠不确定性估计的应用中尤其成问题。贝叶斯神经网络(BNN)通过引入参数不确定性可改善这一局限。此外,潜在二值贝叶斯神经网络(LBBNN)通过允许权重开关机制来考虑结构不确定性,实现在权重与结构联合空间中进行推断。本文针对LBBNN方法提出两项改进:首先,利用局部重参数化技巧(LRT)直接采样隐藏单元,从而获得更高效的算法;更重要的是,通过对LBBNN参数的变分后验分布施加正则化流,网络能够学习比平均场高斯分布更灵活的变分后验分布。实验结果表明,该方法在保持网络稀疏性的同时,相比LBBNN显著提升了预测性能。我们开展了两项仿真研究:第一项研究在逻辑回归框架下进行变量选择,证明更灵活的变分分布有助于提升结果;第二项研究基于二维高斯分布生成的数据比较预测不确定性,论证了本文贝叶斯方法能产生更符合实际的预测不确定性估计。