Convolutional neural networks (CNNs) provide flexible function approximations for a wide variety of applications when the input variables are in the form of images or spatial data. Although CNNs often outperform traditional statistical models in prediction accuracy, statistical inference, such as estimating the effects of covariates and quantifying the prediction uncertainty, is not trivial due to the highly complicated model structure and overparameterization. To address this challenge, we propose a new Bayesian approach by embedding CNNs within the generalized linear models (GLMs) framework. We use extracted nodes from the last hidden layer of CNN with Monte Carlo (MC) dropout as informative covariates in GLM. This improves accuracy in prediction and regression coefficient inference, allowing for the interpretation of coefficients and uncertainty quantification. By fitting ensemble GLMs across multiple realizations from MC dropout, we can account for uncertainties in extracting the features. We apply our methods to biological and epidemiological problems, which have both high-dimensional correlated inputs and vector covariates. Specifically, we consider malaria incidence data, brain tumor image data, and fMRI data. By extracting information from correlated inputs, the proposed method can provide an interpretable Bayesian analysis. The algorithm can be broadly applicable to image regressions or correlated data analysis by enabling accurate Bayesian inference quickly.
翻译:卷积神经网络(CNNs)在输入变量为图像或空间数据形式时,能为多种应用提供灵活的函数逼近。尽管CNNs在预测精度上常优于传统统计模型,但由于其高度复杂的模型结构和过度参数化,统计推断(如估计协变量效应和量化预测不确定性)并非易事。为应对这一挑战,我们提出了一种新的贝叶斯方法,将CNNs嵌入广义线性模型(GLMs)框架中。我们使用从CNN最后一个隐藏层提取的节点(结合蒙特卡洛(MC)Dropout)作为GLM中的信息协变量。这提高了预测和回归系数推断的准确性,允许对系数进行解释和不确定性量化。通过在MC Dropout的多次实现中拟合集成GLMs,我们可以考虑特征提取过程中的不确定性。我们将该方法应用于具有高维相关输入和向量协变量的生物学与流行病学问题,具体包括疟疾发病率数据、脑肿瘤图像数据和fMRI数据。通过从相关输入中提取信息,所提方法能够提供可解释的贝叶斯分析。该算法通过实现快速准确的贝叶斯推断,可广泛应用于图像回归或相关数据分析。