Approximating a univariate function on the interval $[-1,1]$ with a polynomial is among the most classical problems in numerical analysis. When the function evaluations come with noise, a least-squares fit is known to reduce the effect of noise as more samples are taken. The generic algorithm for the least-squares problem requires $O(Nn^2)$ operations, where $N+1$ is the number of sample points and $n$ is the degree of the polynomial approximant. This algorithm is unstable when $n$ is large, for example $n\gg \sqrt{N}$ for equispaced sample points. In this study, we blend numerical analysis and statistics to introduce a stable and fast $O(N\log N)$ algorithm called NoisyChebtrunc based on the Chebyshev interpolation. It has the same error reduction effect as least-squares and the convergence is spectral until the error reaches $O(\sigma \sqrt{{n}/{N}})$, where $\sigma$ is the noise level, after which the error continues to decrease at the Monte-Carlo $O(1/\sqrt{N})$ rate. To determine the polynomial degree, NoisyChebtrunc employs a statistical criterion, namely Mallows' $C_p$. We analyze NoisyChebtrunc in terms of the variance and concentration in the infinity norm to the underlying noiseless function. These results show that with high probability the infinity-norm error is bounded by a small constant times $\sigma \sqrt{{n}/{N}}$, when the noise {is} independent and follows a subgaussian or subexponential distribution. We illustrate the performance of NoisyChebtrunc with numerical experiments.
翻译:在区间$[-1,1]$上使用多项式逼近单变量函数是数值分析中最经典的问题之一。当函数值包含噪声时,已知最小二乘拟合能通过增加采样点来降低噪声影响。求解最小二乘问题的通用算法需要$O(Nn^2)$次运算,其中$N+1$为采样点数量,$n$为多项式逼近的阶数。该算法在$n$较大时不稳定,例如对于等距采样点,当$n\gg \sqrt{N}$时会出现不稳定现象。本研究融合数值分析与统计学,提出一种基于切比雪夫插值的稳定快速算法NoisyChebtrunc,其计算复杂度为$O(N\log N)$。该算法具有与最小二乘法相同的误差衰减效果,且误差呈谱收敛直至达到$O(\sigma \sqrt{{n}/{N}})$量级(其中$\sigma$为噪声水平),此后误差继续以蒙特卡洛$O(1/\sqrt{N})$速率下降。为确定多项式阶数,NoisyChebtrunc采用统计准则——马洛斯$C_p$准则。我们通过方差分析和无穷范数集中性研究了NoisyChebtrunc相对于原始无噪声函数的逼近性能。结果表明:当噪声独立且服从亚高斯或亚指数分布时,以高概率保证无穷范数误差被$\sigma \sqrt{{n}/{N}}$的常数倍所控制。最后通过数值实验展示了NoisyChebtrunc算法的实际性能。