In the streaming data setting, where data arrive continuously or in frequent batches and there is no pre-determined amount of total data, Bayesian models can employ recursive updates, incorporating each new batch of data into the model parameters' posterior distribution. Filtering methods are currently used to perform these updates efficiently, however, they suffer from eventual degradation as the number of unique values within the filtered samples decreases. We propose Generative Filtering, a method for efficiently performing recursive Bayesian updates in the streaming setting. Generative Filtering retains the speed of a filtering method while using parallel updates to avoid degenerate distributions after repeated applications. We derive rates of convergence for Generative Filtering and conditions for the use of sufficient statistics instead of fully storing all past data. We investigate the alleviation of filtering degradation through simulation and Ecological species count data.
翻译:在流式数据场景中(数据连续到达或以频繁批次形式到达,且总数据量无法预先确定),贝叶斯模型可通过递归更新方式,将每一批新数据纳入模型参数的后验分布。当前滤波方法虽能高效执行此类更新,但随着滤波样本中唯一值数量的减少,这些方法最终会出现性能退化。本文提出生成式滤波——一种在流式场景中高效执行递归贝叶斯更新的方法。该方法在保持滤波方法运算速度的同时,通过并行更新避免重复应用后出现退化分布。我们推导了生成式滤波的收敛速率,并给出了使用充分统计量替代完整历史数据存储的条件。通过模拟实验和生态物种计数数据,我们验证了该方法对滤波退化的缓解效果。