Nonparametric density estimation is an unsupervised learning problem. In this work we propose a two-step procedure that casts the density estimation problem in the first step into a supervised regression problem. The advantage is that we can afterwards apply supervised learning methods. Compared to the standard nonparametric regression setting, the proposed procedure creates, however, dependence among the training samples. To derive statistical risk bounds, one can therefore not rely on the well-developed theory for i.i.d. data. To overcome this, we prove an oracle inequality for this specific form of data dependence. As an application, it is shown that under a compositional structure assumption on the underlying density, the proposed two-step method achieves convergence rates that are faster than the standard nonparametric rates. A simulation study illustrates the finite sample performance.
翻译:非参数密度估计是一种无监督学习问题。在本研究中,我们提出一种两步法,将第一步中的密度估计问题转化为监督回归问题。其优势在于随后我们可以应用监督学习方法。然而,与标准的非参数回归设定相比,所提方法会在训练样本间引入依赖性。因此,为推导统计风险界,无法依赖针对独立同分布数据建立的完善理论。为克服此问题,我们针对这种特定形式的数据依赖性证明了一个Oracle不等式。作为应用,研究表明在底层密度具有组合结构假设的条件下,所提两步法能达到比标准非参数收敛率更快的收敛速度。模拟研究验证了有限样本下的性能表现。