Bayesian Generalized Linear Models (GLMs) define a flexible probabilistic framework to model categorical, ordinal and continuous data, and are widely used in practice. However, exact inference in GLMs is prohibitively expensive for large datasets, thus requiring approximations in practice. The resulting approximation error adversely impacts the reliability of the model and is not accounted for in the uncertainty of the prediction. In this work, we introduce a family of iterative methods that explicitly model this error. They are uniquely suited to parallel modern computing hardware, efficiently recycle computations, and compress information to reduce both the time and memory requirements for GLMs. As we demonstrate on a realistically large classification problem, our method significantly accelerates training compared to competitive baselines by trading off reduced computation for increased uncertainty.
翻译:贝叶斯广义线性模型(GLMs)定义了一种灵活的贝叶斯概率框架,用于建模分类、序数和连续型数据,并在实践中得到广泛应用。然而,对于大规模数据集,GLMs的精确推断代价高昂,因此实际应用中必须采用近似方法。由此产生的近似误差会严重影响模型的可靠性,并且未被纳入预测不确定性的考量中。在本研究中,我们提出了一类显式建模该误差的迭代方法。这些方法尤其适合并行化现代计算硬件,能够高效复用计算结果,并通过压缩信息减少GLMs的时间和内存需求。正如我们在一个大规模分类问题中所展示的,我们的方法通过降低计算量换取更大的不确定性,相较于竞争基线显著加速了训练过程。