Generalized linear models (GLMs) arguably represent the standard approach for statistical regression beyond the Gaussian likelihood scenario. When Bayesian formulations are employed, the general absence of a tractable posterior distribution has motivated the development of deterministic approximations, which are generally more scalable than sampling techniques. Among them, expectation propagation (EP) showed extreme accuracy, usually higher than many variational Bayes solutions. However, the higher computational cost of EP posed concerns about its practical feasibility, especially in high-dimensional settings. We address these concerns by deriving a novel efficient formulation of EP for GLMs, whose cost scales linearly in the number of covariates p. This reduces the state-of-the-art O(p^2 n) per-iteration computational cost of the EP routine for GLMs to O(p n min{p,n}), with n being the sample size. We also show that, for binary models and log-linear GLMs approximate predictive means can be obtained at no additional cost. To preserve efficient moment matching for count data, we propose employing a combination of log-normal Laplace transform approximations, avoiding numerical integration. These novel results open the possibility of employing EP in settings that were believed to be practically impossible. Improvements over state-of-the-art approaches are illustrated both for simulated and real data. The efficient EP implementation is available at https://github.com/niccoloanceschi/EPglm.
翻译:广义线性模型(GLMs)可视为高斯似然情形之外统计回归的标准方法。当采用贝叶斯框架时,由于通常不存在易处理的后验分布,这推动了确定性近似方法的发展——此类方法通常比采样技术更具可扩展性。其中,期望传播(EP)展现出极高的准确性,通常优于许多变分贝叶斯解法。然而,EP较高的计算成本引发了对其实际可行性的担忧,特别是在高维场景中。我们通过为GLMs推导一种新颖高效的EP形式化解法来应对这些担忧,其计算成本随协变量数量p呈线性增长。这将GLMs中EP流程当前最优的每轮迭代计算复杂度从O(p^2 n)降低至O(p n min{p,n}),其中n为样本量。我们还证明,对于二值模型和对数线性GLMs,可在不增加额外成本的情况下获得近似预测均值。为保持计数数据的高效矩匹配,我们提出采用对数正态拉普拉斯变换近似的组合方法,从而避免数值积分。这些创新成果为在曾被认为实际不可行的场景中应用EP开辟了可能性。我们在模拟数据和真实数据上均展示了该方法相对于当前最优方法的改进效果。高效的EP实现可在https://github.com/niccoloanceschi/EPglm获取。