As the development of measuring instruments and computers has accelerated the collection of massive data, functional data analysis (FDA) has gained a surge of attention. FDA is a methodology that treats longitudinal data as a function and performs inference, including regression. Functionalizing data typically involves fitting it with basis functions. However, the number of these functions smaller than the sample size is selected commonly. This paper casts doubt on this convention. Recent statistical theory has witnessed a phenomenon (the so-called double descent) in which excess parameters overcome overfitting and lead to precise interpolation. If we transfer this idea to the choice of the number of bases for functional data, providing an excess number of bases can lead to accurate predictions. We have explored this phenomenon in a functional regression problem and examined its validity through numerical experiments. In addition, through application to real-world datasets, we demonstrated that the double descent goes beyond just theoretical and numerical experiments - it is also important for practical use.
翻译:随着测量仪器和计算机的发展加速了海量数据的收集,函数型数据分析(FDA)获得了广泛关注。FDA是一种将纵向数据视为函数并进行包括回归在内的推断的方法。函数化数据通常涉及使用基函数进行拟合,但实践中常选择数量小于样本量的基函数。本文对这一惯例提出了质疑。近期统计理论观察到一种现象(即所谓的"双重下降"):过剩参数能够克服过拟合并实现精确插值。若将此思想迁移至函数型数据基函数数量的选择,提供过剩数量的基函数可产生精确预测。我们已在函数回归问题中探究了这一现象,并通过数值实验验证其有效性。此外,通过对真实世界数据集的实践应用,我们证实双重下降不仅存在于理论与数值实验层面——它在实际应用中同样具有重要意义。