The success of large-scale models in recent years has increased the importance of statistical models with numerous parameters. Several studies have analyzed over-parameterized linear models with high-dimensional data, which may not be sparse; however, existing results rely on the assumption of sample independence. In this study, we analyze a linear regression model with dependent time-series data in an over-parameterized setting. We consider an estimator using interpolation and develop a theory for the excess risk of the estimator. Then, we derive non-asymptotic risk bounds for the estimator for cases with dependent data. This analysis reveals that the coherence of the temporal covariance plays a key role; the risk bound is influenced by the product of temporal covariance matrices at different time steps. Moreover, we show the convergence rate of the risk bound and demonstrate that it is also influenced by the coherence of the temporal covariance. Finally, we provide several examples of specific dependent processes applicable to our setting.
翻译:近年来大规模模型取得的成功,使得参数众多的统计模型的重要性日益凸显。已有若干研究分析了高维数据下的过参数化线性模型,这些数据可能并非稀疏;然而,现有结果均依赖于样本独立性的假设。本研究在过参数化设定下,分析了具有相依时间序列数据的线性回归模型。我们考虑一种采用插值的估计量,并建立了该估计量超额风险的理论框架。随后,我们推导了该估计量在相依数据情形下的非渐近风险界。分析表明,时间协方差的相干性起着关键作用:风险界受到不同时间步时间协方差矩阵乘积的影响。此外,我们给出了风险界的收敛速率,并证明该速率同样受到时间协方差相干性的影响。最后,我们提供了若干适用于本设定下的具体相依过程实例。