In this paper, we consider nonparametric estimation over general Dirichlet metric measure spaces. Unlike the more commonly studied reproducing kernel Hilbert space, whose elements may be defined pointwise, a Dirichlet space typically only contain equivalence classes, i.e. its elements are only unique almost everywhere. This lack of pointwise definition presents significant challenges in the context of nonparametric estimation, for example the classical ridge regression problem is ill-posed. In this paper, we develop a new technique for renormalizing the ridge loss by replacing pointwise evaluations with certain \textit{local means} around the boundaries of obstacles centered at each data point. The resulting renormalized empirical risk functional is well-posed and even admits a representer theorem in terms of certain equilibrium potentials, which are truncated versions of the associated Green function, cut-off at a data-driven threshold. We study the global, out-of-sample consistency of the sample minimizer, and derive an adaptive upper bound on its convergence rate that highlights the interplay of the analytic, geometric, and probabilistic properties of the Dirichlet form. Our framework notably does not require the smoothness of the underlying space, and is applicable to both manifold and fractal settings. To the best of our knowledge, this is the first paper to obtain out-of-sample convergence guarantees in the framework of general metric measure Dirichlet spaces.
翻译:本文研究一般狄利克雷度量测度空间上的非参数估计问题。与更常研究的再生核希尔伯特空间(其元素可逐点定义)不同,狄利克雷空间通常仅包含等价类,即其元素几乎处处唯一。这种逐点定义的缺失给非参数估计带来了重大挑战,例如经典的岭回归问题是不适定的。本文提出一种新的岭损失重正化技术,通过将逐点评估替换为以每个数据点为中心的障碍边界附近的特定\textit{局部均值}。所得的重正化经验风险泛函是适定的,甚至可根据某些平衡势(即关联格林函数在数据驱动阈值处截断的截断版本)得到表示定理。我们研究了样本极小化器的全局样本外一致性,并推导了其收敛速率的自适应上界,该上界揭示了狄利克雷形式的解析、几何与概率特性之间的相互作用。值得注意的是,我们的框架不要求底层空间的光滑性,可同时适用于流形与分形场景。据我们所知,这是首篇在一般度量测度狄利克雷空间框架下获得样本外收敛性保证的论文。