Convolutional neural networks (CNNs) provide flexible function approximations for a wide variety of applications when the input variables are in the form of images or spatial data. Although CNNs often outperform traditional statistical models in prediction accuracy, statistical inference, such as estimating the effects of covariates and quantifying the prediction uncertainty, is not trivial due to the highly complicated model structure and overparameterization. To address this challenge, we propose a new Bayesian approach by embedding CNNs within the generalized linear models (GLMs) framework. We use extracted nodes from the last hidden layer of CNN with Monte Carlo (MC) dropout as informative covariates in GLM. This improves accuracy in prediction and regression coefficient inference, allowing for the interpretation of coefficients and uncertainty quantification. By fitting ensemble GLMs across multiple realizations from MC dropout, we can account for uncertainties in extracting the features. We apply our methods to biological and epidemiological problems, which have both high-dimensional correlated inputs and vector covariates. Specifically, we consider malaria incidence data, brain tumor image data, and fMRI data. By extracting information from correlated inputs, the proposed method can provide an interpretable Bayesian analysis. The algorithm can be broadly applicable to image regressions or correlated data analysis by enabling accurate Bayesian inference quickly.
翻译:卷积神经网络(CNN)在输入变量为图像或空间数据时,为各类应用提供了灵活的函数逼近能力。尽管CNN在预测准确性上通常优于传统统计模型,但由于其高度复杂的模型结构和过参数化特性,统计推断(如协变量效应估计和预测不确定性量化)并非易事。为应对这一挑战,我们提出了一种新的贝叶斯方法,将CNN嵌入广义线性模型(GLM)框架中。我们将CNN最后一个隐藏层通过蒙特卡洛(MC)dropout提取的节点作为GLM中的信息性协变量,从而提升预测精度和回归系数推断性能,实现系数解释与不确定性量化。通过集成来自MC dropout多次实现的GLM模型,可有效处理特征提取过程中的不确定性。我们将该方法应用于生物学和流行病学问题,这些数据兼具高维相关输入与向量协变量特征,具体包括疟疾发病率数据、脑肿瘤图像数据及fMRI数据。通过从相关输入中提取信息,该方法能够提供可解释的贝叶斯分析。该算法可广泛适用于图像回归或关联数据分析,实现快速准确的贝叶斯推断。