Functional regression analysis is an established tool for many contemporary scientific applications. Regression problems involving large and complex data sets are ubiquitous, and feature selection is crucial for avoiding overfitting and achieving accurate predictions. We propose a new, flexible and ultra-efficient approach to perform feature selection in a sparse high dimensional function-on-function regression problem, and we show how to extend it to the scalar-on-function framework. Our method, called FAStEN, combines functional data, optimization, and machine learning techniques to perform feature selection and parameter estimation simultaneously. We exploit the properties of Functional Principal Components and the sparsity inherent to the Dual Augmented Lagrangian problem to significantly reduce computational cost, and we introduce an adaptive scheme to improve selection accuracy. In addition, we derive asymptotic oracle properties, which guarantee estimation and selection consistency for the proposed FAStEN estimator. Through an extensive simulation study, we benchmark our approach to the best existing competitors and demonstrate a massive gain in terms of CPU time and selection performance, without sacrificing the quality of the coefficients' estimation. The theoretical derivations and the simulation study provide a strong motivation for our approach. Finally, we present an application to brain fMRI data from the AOMIC PIOP1 study.
翻译:函数回归分析是众多当代科学应用中的成熟工具。涉及大规模复杂数据集的回归问题无处不在,特征选择对于避免过拟合和实现准确预测至关重要。我们提出了一种新颖、灵活且超高效的方法,用于解决稀疏高维函数对函数回归问题中的特征选择问题,并展示了如何将其扩展至标量对函数框架。我们的方法名为FAStEN,它融合了函数数据、优化和机器学习技术,能够同时进行特征选择和参数估计。我们利用函数主成分的性质以及对偶增广拉格朗日问题固有的稀疏性,显著降低了计算成本,并引入自适应方案以提高选择精度。此外,我们推导了渐近最优性性质,保证了所提出FAStEN估计量的估计和选择一致性。通过广泛的模拟研究,我们将我们的方法与现有最优方法进行基准测试,并展示了在CPU时间和选择性能上的巨大提升,且不牺牲系数估计的质量。理论推导和模拟研究为我们的方法提供了强有力的依据。最后,我们将其应用于AOMIC PIOP1研究中的脑功能磁共振成像数据。