In this paper, we will outline a novel data-driven method for estimating functions in a multivariate nonparametric regression model based on an adaptive knot selection for B-splines. The underlying idea of our approach for selecting knots is to apply the generalized lasso, since the knots of the B-spline basis can be seen as changes in the derivatives of the function to be estimated. This method was then extended to functions depending on several variables by processing each dimension independently, thus reducing the problem to a univariate setting. The regularization parameters were chosen by means of a criterion based on EBIC. The nonparametric estimator was obtained using a multivariate B-spline regression with the corresponding selected knots. Our procedure was validated through numerical experiments by varying the number of observations and the level of noise to investigate its robustness. The influence of observation sampling was also assessed and our method was applied to a chemical system commonly used in geoscience. For each different framework considered in this paper, our approach performed better than state-of-the-art methods. Our completely data-driven method is implemented in the glober R package which will soon be available on the Comprehensive R Archive Network (CRAN).
翻译:本文提出了一种新的数据驱动方法,用于在多元非参数回归模型中基于B样条的自适应节点选择进行函数估计。该方法选择节点的基本思路是应用广义lasso,因为B样条基函数的节点可视为待估计函数导数的变化点。通过独立处理每个维度将问题简化为单变量情形后,该方法进一步扩展至多变量函数。正则化参数基于EBIC准则进行选择。非参数估计通过多元B样条回归结合所选节点实现。我们通过改变观测数量和噪声水平进行数值实验以验证方法的鲁棒性,同时评估了观测采样方式的影响,并将该方法应用于地球科学中常用的化学系统。在本文考虑的每种不同场景下,我们的方法均优于现有最优方法。该完全数据驱动的方法已实现于glober R包中,该包即将在CRAN(综合R档案网络)上发布。