We study the classical problem of predicting an outcome variable, $Y$, using a linear combination of a $d$-dimensional covariate vector, $\mathbf{X}$. We are interested in linear predictors whose coefficients solve: % \begin{align*} \inf_{\boldsymbol{\beta} \in \mathbb{R}^d} \left( \mathbb{E}_{\mathbb{P}_n} \left[ \left(Y-\mathbf{X}^{\top}\beta \right)^r \right] \right)^{1/r} +\delta \, \rho\left(\boldsymbol{\beta}\right), \end{align*} where $\delta>0$ is a regularization parameter, $\rho:\mathbb{R}^d\to \mathbb{R}_+$ is a convex penalty function, $\mathbb{P}_n$ is the empirical distribution of the data, and $r\geq 1$. We present three sets of new results. First, we provide conditions under which linear predictors based on these estimators % solve a \emph{distributionally robust optimization} problem: they minimize the worst-case prediction error over distributions that are close to each other in a type of \emph{max-sliced Wasserstein metric}. Second, we provide a detailed finite-sample and asymptotic analysis of the statistical properties of the balls of distributions over which the worst-case prediction error is analyzed. Third, we use the distributionally robust optimality and our statistical analysis to present i) an oracle recommendation for the choice of regularization parameter, $\delta$, that guarantees good out-of-sample prediction error; and ii) a test-statistic to rank the out-of-sample performance of two different linear estimators. None of our results rely on sparsity assumptions about the true data generating process; thus, they broaden the scope of use of the square-root lasso and related estimators in prediction problems.
翻译:我们研究利用d维协变量向量X的线性组合预测结果变量Y这一经典问题。我们关注系数满足以下条件的线性预测器:% \begin{align*} \inf_{\boldsymbol{\beta} \in \mathbb{R}^d} \left( \mathbb{E}_{\mathbb{P}_n} \left[ \left(Y-\mathbf{X}^{\top}\beta \right)^r \right] \right)^{1/r} +\delta \, \rho\left(\boldsymbol{\beta}\right), \end{align*} 其中δ>0为正则化参数,ρ:ℝ^d→ℝ_+为凸惩罚函数,ℙ_n为数据的经验分布,r≥1。我们提出三组新结果。首先,我们给出这些估计量基于的线性预测器满足分布鲁棒优化问题的条件:它们最小化在某种最大切片Wasserstein度量下彼此接近的分布上的最坏情况预测误差。其次,我们对用于分析最坏情况预测误差的分布球的统计性质进行详细的有限样本和渐近分析。第三,利用分布鲁棒最优性和统计分析,我们提出:i) 保证良好样本外预测误差的正则化参数δ的Oracle推荐选择;ii) 用于排序两个不同线性估计量样本外性能的检验统计量。所有结果均不依赖于真实数据生成过程的稀疏性假设,从而拓展了平方根LASSO及相关估计量在预测问题中的适用范围。