We investigate statistical properties of a likelihood approach to nonparametric estimation of a singular distribution using deep generative models. More specifically, a deep generative model is used to model high-dimensional data that are assumed to concentrate around some low-dimensional structure. Estimating the distribution supported on this low-dimensional structure, such as a low-dimensional manifold, is challenging due to its singularity with respect to the Lebesgue measure in the ambient space. In the considered model, a usual likelihood approach can fail to estimate the target distribution consistently due to the singularity. We prove that a novel and effective solution exists by perturbing the data with an instance noise, which leads to consistent estimation of the underlying distribution with desirable convergence rates. We also characterize the class of distributions that can be efficiently estimated via deep generative models. This class is sufficiently general to contain various structured distributions such as product distributions, classically smooth distributions and distributions supported on a low-dimensional manifold. Our analysis provides some insights on how deep generative models can avoid the curse of dimensionality for nonparametric distribution estimation. We conduct a thorough simulation study and real data analysis to empirically demonstrate that the proposed data perturbation technique improves the estimation performance significantly.
翻译:本文研究了利用深度生成模型对奇异分布进行非参数估计时,基于似然方法的统计性质。具体而言,深度生成模型被用于建模假设集中在某低维结构周围的高维数据。由于该分布相对于环境空间中的勒贝格测度具有奇异性,估计支撑在低维流形等低维结构上的此类分布颇具挑战性。在考虑模型中,标准似然方法因奇异性可能无法一致地估计目标分布。我们证明了一种新颖有效的解决方案:通过向数据添加实例噪声,可实现对潜在分布的一致估计并获得理想的收敛速率。同时,我们刻画了可通过深度生成模型高效估计的分布类。该分布类具有充分一般性,能涵盖乘积分布、经典光滑分布及支撑在低维流形上的分布等多种结构化分布。我们的分析揭示了深度生成模型如何规避非参数分布估计中的维度灾难。通过全面的模拟研究与真实数据分析,我们实证证明了所提出的数据扰动技术能显著提升估计性能。