This paper establishes bounds on the performance of empirical risk minimization for large-dimensional linear regression. We generalize existing results by allowing the data to be dependent and heavy-tailed. The analysis covers both the cases of identically and heterogeneously distributed observations. Our analysis is nonparametric in the sense that the relationship between the regressand and the regressors is not specified. The main results of this paper show that the empirical risk minimizer achieves the optimal performance (up to a logarithmic factor) in a dependent data setting.
翻译:本文建立了大数据维线性回归中经验风险最小化性能的界。我们允许数据存在相关性和重尾分布,从而推广了现有结果。该分析涵盖了同分布与异分布观测两种情况。本文的分析是非参数的,即未指定被回归变量与回归变量之间的关系。主要结果表明,在相关数据环境下,经验风险最小化器能够实现最优性能(仅差一个对数因子)。