As the development of measuring instruments and computers has accelerated the collection of massive amounts of data, functional data analysis (FDA) has experienced a surge of attention. The FDA methodology treats longitudinal data as a set of functions on which inference, including regression, is performed. Functionalizing data typically involves fitting the data with basis functions. In general, the number of basis functions smaller than the sample size is selected. This paper casts doubt on this convention. Recent statistical theory has revealed the so-called double-descent phenomenon in which excess parameters overcome overfitting and lead to precise interpolation. Applying this idea to choosing the number of bases to be used for functional data, we show that choosing an excess number of bases can lead to more accurate predictions. Specifically, we explored this phenomenon in a functional regression context and examined its validity through numerical experiments. In addition, we introduce two real-world datasets to demonstrate that the double-descent phenomenon goes beyond theoretical and numerical experiments, confirming its importance in practical applications.
翻译:随着测量仪器和计算机的发展加速了海量数据的收集,功能数据分析(FDA)受到了广泛关注。FDA方法将纵向数据视为一组函数,并在其上执行包括回归在内的推断。数据功能化通常涉及用基函数拟合数据。一般而言,选择的基函数数量会小于样本量。本文对这一惯例提出了质疑。最近的统计理论揭示了所谓的"双下降"现象,即过量参数能够克服过拟合并实现精确插值。将这一思想应用于选择功能数据使用的基函数数量,我们证明选择过量的基函数数量能够带来更准确的预测。具体而言,我们在功能回归背景下探索了这一现象,并通过数值实验检验了其有效性。此外,我们引入了两个真实世界数据集来证明双下降现象超越了理论和数值实验的范畴,从而确认了其在实际应用中的重要性。