Accurate approximation of a real-valued function depends on two aspects of the available data: the density of inputs within the domain of interest and the variation of the outputs over that domain. There are few methods for assessing whether the density of inputs is \textit{sufficient} to identify the relevant variations in outputs -- i.e., the ``geometric scale'' of the function -- despite the fact that sampling density is closely tied to the success or failure of an approximation method. In this paper, we introduce a general purpose, computational approach to detecting the geometric scale of real-valued functions over a fixed domain using a deterministic interpolation technique from computational geometry. The algorithm is intended to work on scalar data in moderate dimensions (2-10). Our algorithm is based on the observation that a sequence of piecewise linear interpolants will converge to a continuous function at a quadratic rate (in $L^2$ norm) if and only if the data are sampled densely enough to distinguish the feature from noise (assuming sufficiently regular sampling). We present numerical experiments demonstrating how our method can identify feature scale, estimate uncertainty in feature scale, and assess the sampling density for fixed (i.e., static) datasets of input-output pairs. We include analytical results in support of our numerical findings and have released lightweight code that can be adapted for use in a variety of data science settings.
翻译:实值函数的精确逼近取决于可用数据的两个方面:感兴趣域内输入点的密度以及该域上输出的变化。尽管采样密度与逼近方法的成败密切相关,但评估输入密度是否足以识别输出中的相关变化——即函数的“几何尺度”——的方法却寥寥无几。本文提出一种通用计算策略,通过计算几何中的确定性插值技术,检测固定域上实值函数的几何尺度。该算法适用于中等维度(2-10维)的标量数据。其理论基础在于:当且仅当数据采样足够密集以区分特征与噪声时(假设采样具有充分正则性),分段线性插值序列将以二次速率(以$L^2$范数衡量)收敛至连续函数。我们通过数值实验展示该方法如何识别特征尺度、估计特征尺度的不确定性,以及评估固定(即静态)输入-输出对数据集的采样密度。文中提供了支持数值结果的理论分析,并发布了轻量级代码,可适配多种数据科学应用场景。