In this paper, we consider nonparametric estimation over general Dirichlet metric measure spaces. Unlike the more commonly studied reproducing kernel Hilbert space, whose elements may be defined pointwise, a Dirichlet space typically only contain equivalence classes, i.e. its elements are only unique almost everywhere. This lack of pointwise definition presents significant challenges in the context of nonparametric estimation, for example the classical ridge regression problem is ill-posed. In this paper, we develop a new technique for renormalizing the ridge loss by replacing pointwise evaluations with certain \textit{local means} around the boundaries of obstacles centered at each data point. The resulting renormalized empirical risk functional is well-posed and even admits a representer theorem in terms of certain equilibrium potentials, which are truncated versions of the associated Green function, cut-off at a data-driven threshold. We study the global, out-of-sample consistency of the sample minimizer, and derive an adaptive upper bound on its convergence rate that highlights the interplay of the analytic, geometric, and probabilistic properties of the Dirichlet form. Our framework notably does not require the smoothness of the underlying space, and is applicable to both manifold and fractal settings. To the best of our knowledge, this is the first paper to obtain out-of-sample convergence guarantees in the framework of general metric measure Dirichlet spaces.
翻译:本文研究一般Dirichlet度量测度空间上的非参数估计问题。与更常见且元素可逐点定义的再生核希尔伯特空间不同,Dirichlet空间通常仅包含等价类,即其元素在几乎处处意义下唯一。这种逐点定义的缺失给非参数估计带来了显著挑战,例如经典岭回归问题在此框架下是不适定的。本文提出一种通过"局部均值"重构岭损失的新技术:在每个数据点为中心的障碍边界处,用特定局部均值替代逐点评估。由此得到的重整化经验风险泛函是适定的,甚至能够以特定平衡势函数的形式满足表示定理——这些平衡势函数是截断版本的关联格林函数,其截断阈值由数据驱动确定。我们研究了样本极小化器的全局样本外一致性,推导了其收敛速率的自适应上界,该上界揭示了Dirichlet形式的解析性质、几何特性与概率特征之间的相互作用。值得注意的是,本框架不要求底层空间具有光滑性,可同时适用于流形与分形场景。据我们所知,这是在一般度量测度Dirichlet空间框架下首次获得样本外收敛保证的研究。