Neural networks (NNs) are primarily developed within the frequentist statistical framework. Nevertheless, frequentist NNs lack the capability to provide uncertainties in the predictions, and hence their robustness can not be adequately assessed. Conversely, the Bayesian neural networks (BNNs) naturally offer predictive uncertainty by applying Bayes' theorem. However, their computational requirements pose significant challenges. Moreover, both frequentist NNs and BNNs suffer from overfitting issues when dealing with noisy and sparse data, which render their predictions unwieldy away from the available data space. To address both these problems simultaneously, we leverage insights from a hierarchical setting in which the parameter priors are conditional on hyperparameters to construct a BNN by applying a semi-analytical framework known as nonlinear sparse Bayesian learning (NSBL). We call our network sparse Bayesian neural network (SBNN) which aims to address the practical and computational issues associated with BNNs. Simultaneously, imposing a sparsity-inducing prior encourages the automatic pruning of redundant parameters based on the automatic relevance determination (ARD) concept. This process involves removing redundant parameters by optimally selecting the precision of the parameters prior probability density functions (pdfs), resulting in a tractable treatment for overfitting. To demonstrate the benefits of the SBNN algorithm, the study presents an illustrative regression problem and compares the results of a BNN using standard Bayesian inference, hierarchical Bayesian inference, and a BNN equipped with the proposed algorithm. Subsequently, we demonstrate the importance of considering the full parameter posterior by comparing the results with those obtained using the Laplace approximation with and without NSBL.
翻译:神经网络主要在频率统计框架下发展,但频率学派神经网络无法提供预测的不确定性,因此其鲁棒性难以充分评估。相反,贝叶斯神经网络通过应用贝叶斯定理天然地提供了预测不确定性,然而其计算需求带来了重大挑战。此外,当处理含噪声的稀疏数据时,频率学派神经网络与贝叶斯神经网络均会遭遇过拟合问题,导致其在远离可用数据空间的区域产生不可靠的预测。为同时解决这些问题,我们利用层次化设置(其中参数先验以超参数为条件)的洞见,通过应用称为非线性稀疏贝叶斯学习的半解析框架构建贝叶斯神经网络。我们将该网络命名为稀疏贝叶斯神经网络,旨在解决贝叶斯神经网络相关的实践与计算难题。同时,施加稀疏诱导先验基于自动相关性判定概念鼓励冗余参数自动剪枝——通过优化参数先验概率密度函数的精度移除冗余参数,从而为过拟合问题提供可解方案。为展示该算法的优势,本研究通过一个说明性回归问题,对比了采用标准贝叶斯推断、层次贝叶斯推断的贝叶斯神经网络以及配备所提算法的贝叶斯神经网络的性能。随后,通过对比有无非线性稀疏贝叶斯学习的拉普拉斯近似结果,我们证明了考虑完整参数后验的重要性。