Bayesian Generalized Linear Models (GLMs) define a flexible probabilistic framework to model categorical, ordinal and continuous data, and are widely used in practice. However, exact inference in GLMs is prohibitively expensive for large datasets, thus requiring approximations in practice. The resulting approximation error adversely impacts the reliability of the model and is not accounted for in the uncertainty of the prediction. In this work, we introduce a family of iterative methods that explicitly model this error. They are uniquely suited to parallel modern computing hardware, efficiently recycle computations, and compress information to reduce both the time and memory requirements for GLMs. As we demonstrate on a realistically large classification problem, our method significantly accelerates training by explicitly trading off reduced computation for increased uncertainty.
翻译:贝叶斯广义线性模型(GLMs)定义了一个灵活的贝叶斯概率框架,可用于对分类、有序及连续数据进行建模,并在实际应用中广泛使用。然而,在大规模数据集上,GLMs的精确推断计算成本过高,因此在实践中需要采用近似方法。由此产生的近似误差会损害模型的可靠性,并且无法在预测的不确定性中体现。针对这一问题,本文提出了一类对近似误差进行显式建模的迭代方法。这些方法特别适用于并行现代计算硬件,可高效重用计算结果,并通过信息压缩降低GLMs的时间和内存需求。正如我们在一个实际大规模分类问题上所展示的,该方法通过显式权衡计算代价与不确定性增长,显著加速了训练过程。