Nonparametric density estimation is an unsupervised learning problem. In this work we propose a two-step procedure that casts the density estimation problem in the first step into a supervised regression problem. The advantage is that we can afterwards apply supervised learning methods. Compared to the standard nonparametric regression setting, the proposed procedure creates, however, dependence among the training samples. To derive statistical risk bounds, one can therefore not rely on the well-developed theory for i.i.d. data. To overcome this, we prove an oracle inequality for this specific form of data dependence. As an application, it is shown that under a compositional structure assumption on the underlying density the proposed two-step method achieves faster convergence rates. A simulation study illustrates the finite sample performance.
翻译:非参数密度估计是一个无监督学习问题。本文提出了一种两步法,将密度估计问题在第一步中转化为有监督回归问题。其优势在于我们随后可以应用有监督学习方法。然而,与标准非参数回归设置相比,所提出的方法在训练样本之间引入了依赖关系。因此,无法直接利用独立同分布数据的成熟理论来推导统计风险界。为解决这一问题,我们针对这种特定形式的数据依赖性证明了一个Oracle不等式。作为应用实例,表明在底层密度具有组合结构假设的条件下,所提出的两步法能够实现更快的收敛速度。模拟研究展示了有限样本下的性能表现。