Density deconvolution addresses the estimation of the unknown (probability) density function $f$ of a random signal from data that are observed with an independent additive random noise. This is a classical problem in statistics, for which frequentist and Bayesian nonparametric approaches are available to deal with static or batch data. In this paper, we consider the problem of density deconvolution in a streaming or online setting where noisy data arrive progressively, with no predetermined sample size, and we develop a sequential nonparametric approach to estimate $f$. By relying on a quasi-Bayesian sequential approach, often referred to as Newton's algorithm, we obtain estimates of $f$ that are of easy evaluation, computationally efficient, and with a computational cost that remains constant as the amount of data increases, which is critical in the streaming setting. Large sample asymptotic properties of the proposed estimates are studied, yielding provable guarantees with respect to the estimation of $f$ at a point (local) and on an interval (uniform). In particular, we establish local and uniform central limit theorems, providing corresponding asymptotic credible intervals and bands. We validate empirically our methods on synthetic and real data, by considering the common setting of Laplace and Gaussian noise distributions, and make a comparison with respect to the kernel-based approach and a Bayesian nonparametric approach with a Dirichlet process mixture prior.
翻译:密度解卷积旨在从带有独立加性随机噪声的观测数据中,估计未知随机信号的概率密度函数$f$。这是统计学中的一个经典问题,已有频率学派和贝叶斯非参数方法可用于处理静态或批量数据。本文考虑在流式或在线场景下的密度解卷积问题,其中含噪数据持续到达且无预定样本量,我们提出了一种序列非参数方法来估计$f$。通过采用常被称为牛顿算法的准贝叶斯序列方法,我们获得了易于评估、计算高效且计算成本随数据量增加保持恒定的$f$估计量,这在流式场景中至关重要。我们研究了所提估计量的大样本渐近性质,在点估计(局部)和区间估计(均匀)方面为$f$的估计提供了可证明的保证。特别地,我们建立了局部与均匀中心极限定理,并给出了相应的渐近可信区间与置信带。通过在拉普拉斯和高斯噪声分布的常见设定下对合成数据与真实数据进行实验,我们验证了所提方法的有效性,并与基于核函数的方法以及采用狄利克雷过程混合先验的贝叶斯非参数方法进行了比较。