Nonparametric Regression in Dirichlet Spaces: A Random Obstacle Approach

In this paper, we consider nonparametric estimation over general Dirichlet metric measure spaces. Unlike the more commonly studied reproducing kernel Hilbert space, whose elements may be defined pointwise, a Dirichlet space typically only contain equivalence classes, i.e. its elements are only unique almost everywhere. This lack of pointwise definition presents significant challenges in the context of nonparametric estimation, for example the classical ridge regression problem is ill-posed. In this paper, we develop a new technique for renormalizing the ridge loss by replacing pointwise evaluations with certain \textit{local means} around the boundaries of obstacles centered at each data point. The resulting renormalized empirical risk functional is well-posed and even admits a representer theorem in terms of certain equilibrium potentials, which are truncated versions of the associated Green function, cut-off at a data-driven threshold. We study the global, out-of-sample consistency of the sample minimizer, and derive an adaptive upper bound on its convergence rate that highlights the interplay of the analytic, geometric, and probabilistic properties of the Dirichlet form. We also construct a simple regressogram type estimator that achieves the minimax optimal estimation rate over certain $L^p$ subsets of a Dirichlet ball with some knowledge of the geometry of the metric measure space. Our framework notably does not require the smoothness of the underlying space, and is applicable to both manifold and fractal settings. To the best of our knowledge, this is the first paper to obtain out-of-sample convergence guarantees in the framework of general metric measure Dirichlet spaces.

翻译：本文研究一般狄利克雷度量测度空间上的非参数估计问题。与更常研究的再生核希尔伯特空间（其元素可逐点定义）不同，狄利克雷空间通常仅包含等价类，即其元素在几乎处处意义下唯一。这种逐点定义的缺失给非参数估计带来了显著挑战，例如经典的岭回归问题是病态的。本文提出一种通过用围绕每个数据点为中心的障碍边界处的特定\textit{局部均值}替代逐点评估来重整岭损失的新技术。所得重整经验风险泛函是适定的，甚至可根据某些平衡势（即关联格林函数在数据驱动阈值处截断的截断版本）得到表示定理。我们研究了样本极小化器的全局样本外一致性，并推导了其收敛速率的自适应上界，该上界揭示了狄利克雷形式的解析、几何与概率特性之间的相互作用。我们还构造了一种简单的回归直方图型估计量，该估计量在已知度量测度空间几何结构的条件下，能在狄利克雷球的特定$L^p$子集上达到极小极大最优估计速率。值得注意的是，我们的框架不要求底层空间的光滑性，可同时适用于流形与分形场景。据我们所知，本文是首篇在一般度量测度狄利克雷空间框架下获得样本外收敛性保证的研究。