We develop extreme data compression for use in Bayesian model comparison via the MOPED algorithm, as well as more general score compression. We find that Bayes factors from data compressed with the MOPED algorithm are identical to those from their uncompressed datasets when the models are linear and the errors Gaussian. In other nonlinear cases, whether nested or not, we find negligible differences in the Bayes factors, and show this explicitly for the Pantheon-SH0ES supernova dataset. We also investigate the sampling properties of the Bayesian Evidence as a frequentist statistic, and find that extreme data compression reduces the sampling variance of the Evidence, but has no impact on the sampling distribution of Bayes factors. Since model comparison can be a very computationally-intensive task, MOPED extreme data compression may present significant advantages in computational time.
翻译:我们开发了基于MOPED算法的极端数据压缩方法,用于贝叶斯模型比较,同时涉及更通用的分数压缩技术。研究发现,当模型为线性且误差服从高斯分布时,经MOPED算法压缩后数据所得的贝叶斯因子与未压缩数据集完全一致。在非线性情形(无论模型是否嵌套)中,贝叶斯因子的差异可忽略不计,我们以Pantheon-SH0ES超新星数据集为例进行了明确验证。此外,通过将贝叶斯证据作为频率统计量研究其抽样特性,发现极端数据压缩虽能降低证据的抽样方差,但对贝叶斯因子的抽样分布无影响。鉴于模型比较通常计算量极大,MOPED极端数据压缩方法可显著节约计算时间。