It is more and more frequently the case in applications that the data we observe come from one or more random variables taking values in an infinite dimensional space, e.g. curves. The need to have tools adapted to the nature of these data explains the growing interest in the field of functional data analysis. The model we study in this paper assumes a linear dependence between a quantity of interest and several covariates, at least one of which has an infinite dimension. To select the relevant covariates in this context, we investigate adaptations of the Lasso method. Two estimation methods are defined. The first one consists in the minimization of a Group-Lasso criterion on the multivariate functional space H. The second one minimizes the same criterion but on a finite dimensional subspaces of H whose dimension is chosen by a penalized least squares method. We prove oracle inequalities of sparsity in the case where the design is fixed or random. To compute the solutions of both criteria in practice, we propose a coordinate descent algorithm. A numerical study on simulated and real data illustrates the behavior of the estimators.
翻译:在应用研究中,我们观测到的数据越来越多地来自取值于无限维空间(例如曲线)的一个或多个随机变量。对适应这些数据性质的工具的需求,解释了函数型数据分析领域日益增长的研究兴趣。本文研究的模型假设感兴趣的量与多个协变量之间存在线性依赖关系,其中至少一个协变量具有无限维特征。在此背景下,为选择相关协变量,我们探究了Lasso方法的适应性变体。定义了两类估计方法:第一种方法是在多元函数空间H上最小化Group-Lasso准则;第二种方法则是在H的有限维子空间上最小化相同准则,该子空间的维度通过惩罚最小二乘法选取。我们证明了在设计矩阵固定或随机情形下的稀疏性Oracle不等式。为实际计算两类准则的解,提出了坐标下降算法。通过模拟数据与真实数据的数值实验,展示了估计量的行为特征。