In the streaming data setting, where data arrive continuously or in frequent batches and there is no pre-determined amount of total data, Bayesian models can employ recursive updates, incorporating each new batch of data into the model parameters' posterior distribution. Filtering methods are currently used to perform these updates efficiently, however, they suffer from eventual degradation as the number of unique values within the filtered samples decreases. We propose Generative Filtering, a method for efficiently performing recursive Bayesian updates in the streaming setting. Generative Filtering retains the speed of a filtering method while using parallel updates to avoid degenerate distributions after repeated applications. We derive rates of convergence for Generative Filtering and conditions for the use of sufficient statistics instead of fully storing all past data. We investigate the alleviation of filtering degradation through simulation and Ecological species count data.
翻译:在流数据场景中,数据持续或以频繁批次到达且总量未预先确定时,贝叶斯模型可采用递归更新方式,将每批新数据纳入模型参数的后验分布。当前虽采用滤波方法高效执行此类更新,但随着滤波样本中唯一值数量减少,其性能会出现渐进退化。本文提出生成式滤波方法,用于在流数据场景中高效执行递归贝叶斯更新。该方法在保持滤波速度的同时,通过并行更新避免重复应用后出现退化分布。我们推导了生成式滤波的收敛速率,以及使用充分统计量替代完整存储历史数据的适用条件。通过仿真实验与生态物种计数数据,我们验证了该方法对滤波退化的缓解效果。