In this paper, we will outline a novel data-driven method for estimating functions in a multivariate nonparametric regression model based on an adaptive knot selection for B-splines. The underlying idea of our approach for selecting knots is to apply the generalized lasso, since the knots of the B-spline basis can be seen as changes in the derivatives of the function to be estimated. This method was then extended to functions depending on several variables by processing each dimension independently, thus reducing the problem to a univariate setting. The regularization parameters were chosen by means of a criterion based on EBIC. The nonparametric estimator was obtained using a multivariate B-spline regression with the corresponding selected knots. Our procedure was validated through numerical experiments by varying the number of observations and the level of noise to investigate its robustness. The influence of observation sampling was also assessed and our method was applied to a chemical system commonly used in geoscience. For each different framework considered in this paper, our approach performed better than state-of-the-art methods. Our completely data-driven method is implemented in the glober R package which is available on the Comprehensive R Archive Network (CRAN).
翻译:本文提出一种新颖的数据驱动方法,用于多元非参数回归模型中的函数估计,其核心基于B样条的自适应节点选择。该方法的节点选择思路是应用广义套索算法——由于B样条基函数的节点可视为待估函数导数的变化点。随后通过独立处理每个维度,将该方法扩展至多变量函数,从而将问题简化为单变量情形。正则化参数通过基于扩展贝叶斯信息准则(EBIC)的准则选取。非参数估计量通过对应选定节点的多元B样条回归获得。我们通过改变观测数量与噪声水平进行数值实验验证该方法的稳健性,同时评估了观测采样的影响,并将该方法应用于地球科学中常用的化学系统。在本文考虑的所有不同场景中,我们的方法均优于现有最优方法。该完全数据驱动的方法已实现在glober R包中,可通过综合R档案网络(CRAN)获取。