Mixed-effects regression models represent a useful subclass of regression models for grouped data; the introduction of random effects allows for the correlation between observations within each group to be conveniently captured when inferring the fixed effects. At a time where such regression models are being fit to increasingly large datasets with many groups, it is ideal if (a) the time it takes to make the inferences scales linearly with the number of groups and (b) the inference workload can be distributed across multiple computational nodes in a numerically stable way, if the dataset cannot be stored in one location. Current Bayesian inference approaches for mixed-effects regression models do not seem to account for both challenges simultaneously. To address this, we develop an expectation propagation (EP) framework in this setting that is both scalable and numerically stable when distributed for the case where there is only one grouping factor. The main technical innovations lie in the sparse reparameterisation of the EP algorithm, and a moment propagation (MP) based refinement for multivariate random effect factor approximations. Experiments are conducted to show that this EP framework achieves linear scaling, while having comparable accuracy to other scalable approximate Bayesian inference (ABI) approaches.
翻译:混合效应回归模型是处理分组数据的一类重要回归模型;通过引入随机效应,可以在推断固定效应时方便地捕捉每组内观测值之间的相关性。当此类回归模型被应用于具有大量组别的日益增长的大型数据集时,理想的推断方法应满足:(a) 推断所需时间随组别数量线性增长,并且 (b) 如果数据集无法存储于单一位置,推断工作负载能够以数值稳定的方式分布到多个计算节点上。现有的混合效应回归模型贝叶斯推断方法似乎未能同时应对这两项挑战。为此,我们针对仅存在一个分组因子的情况,开发了一种期望传播框架,该框架在分布式计算时兼具可扩展性与数值稳定性。主要的技术创新在于EP算法的稀疏重参数化,以及基于矩传播的多变量随机效应因子近似优化方法。实验表明,该EP框架实现了线性扩展,同时其精度与其他可扩展的近似贝叶斯推断方法相当。