In this work, we propose a novel deep bootstrap framework for nonparametric regression based on conditional diffusion models. Specifically, we construct a conditional diffusion model to learn the distribution of the response variable given the covariates. This model is then used to generate bootstrap samples by pairing the original covariates with newly synthesized responses. We reformulate nonparametric regression as conditional sample mean estimation, which is implemented directly via the learned conditional diffusion model. Unlike traditional bootstrap methods that decouple the estimation of the conditional distribution, sampling, and nonparametric regression, our approach integrates these components into a unified generative framework. With the expressive capacity of diffusion models, our method facilitates both efficient sampling from high-dimensional or multimodal distributions and accurate nonparametric estimation. We establish rigorous theoretical guarantees for the proposed method. In particular, we derive optimal end-to-end convergence rates in the Wasserstein distance between the learned and target conditional distributions. Building on this foundation, we further establish the convergence guarantees of the resulting bootstrap procedure. Numerical studies demonstrate the effectiveness and scalability of our approach for complex regression tasks.
翻译:本文提出了一种基于条件扩散模型的新型深度引导框架,用于非参数回归。具体而言,我们构建了一个条件扩散模型来学习给定协变量时响应变量的分布。随后,该模型通过将原始协变量与新合成的响应配对来生成引导样本。我们将非参数回归重新表述为条件样本均值估计问题,并直接通过已学习的条件扩散模型实现。与传统引导方法将条件分布估计、抽样和非参数回归解耦不同,我们的方法将这些组件整合到一个统一的生成框架中。借助扩散模型的表达能力,我们的方法既能从高维或多峰分布中进行高效抽样,又能实现精确的非参数估计。我们为所提方法建立了严格的理论保证。特别地,我们推导了学习条件分布与目标条件分布之间Wasserstein距离的最优端到端收敛速率。在此基础上,我们进一步建立了所得引导过程的收敛性保证。数值研究证明了我们的方法在处理复杂回归任务时的有效性和可扩展性。