We develop extreme data compression for use in Bayesian model comparison via the MOPED algorithm, as well as more general score compression. We find that Bayes factors from data compressed with the MOPED algorithm are identical to those from their uncompressed datasets when the models are linear and the errors Gaussian. In other nonlinear cases, whether nested or not, we find negligible differences in the Bayes factors, and show this explicitly for the Pantheon-SH0ES supernova dataset. We also investigate the sampling properties of the Bayesian Evidence as a frequentist statistic, and find that extreme data compression reduces the sampling variance of the Evidence, but has no impact on the sampling distribution of Bayes factors. Since model comparison can be a very computationally-intensive task, MOPED extreme data compression may present significant advantages in computational time.
翻译:摘要:我们通过MOPED算法及更广义的得分压缩技术,将极端数据压缩方法应用于贝叶斯模型比较。研究发现,当模型为线性且误差服从高斯分布时,经MOPED算法压缩数据的贝叶斯因子与未压缩数据集的结果完全相同。在非线性情形下(无论模型是否嵌套),贝叶斯因子差异可忽略不计;我们以Pantheon-SH0ES超新星数据集为例具体证明了这一点。此外,我们探究了贝叶斯证据作为频率统计量时的采样性质,发现极端数据压缩会降低证据的采样方差,但对贝叶斯因子的采样分布无影响。由于模型比较通常是计算密集型任务,MOPED极端数据压缩算法在计算时间方面或具有显著优势。