In the streaming data setting, where data arrive continuously or in frequent batches and there is no pre-determined amount of total data, Bayesian models can employ recursive updates, incorporating each new batch of data into the model parameters' posterior distribution. Filtering methods are currently used to perform these updates efficiently, however, they suffer from eventual degradation as the number of unique values within the filtered samples decreases. We propose Generative Filtering, a method for efficiently performing recursive Bayesian updates in the streaming setting. Generative Filtering retains the speed of a filtering method while using parallel updates to avoid degenerate distributions after repeated applications. We derive rates of convergence for Generative Filtering and conditions for the use of sufficient statistics instead of fully storing all past data. We investigate the alleviation of filtering degradation through simulation and Ecological species count data.
翻译:在流式数据场景中,数据以连续或频繁批处理方式到达,且总数据量未预先设定,贝叶斯模型可采用递归更新方式,将每一批新数据融入模型参数的后验分布中。当前滤波方法虽能高效执行此类更新,但会因滤波样本中唯一值数量减少而逐步退化。我们提出生成式滤波(Generative Filtering),一种在流式场景中高效执行递归贝叶斯更新的方法。生成式滤波在保持滤波方法速度优势的同时,通过并行更新避免重复应用后分布退化的现象。我们推导了生成式滤波的收敛速率,并给出了使用充分统计量替代完整历史数据存储的条件。通过仿真实验和生态物种计数数据,我们验证了该方法对滤波退化的缓解效果。