The noise barrier and the large signal bias of the Lasso and other convex estimators

Convex estimators such as the Lasso, the matrix Lasso and the group Lasso have been studied extensively in the last two decades, demonstrating great success in both theory and practice. Two quantities are introduced, the noise barrier and the large scale bias, that provides insights on the performance of these convex regularized estimators. It is now well understood that the Lasso achieves fast prediction rates, provided that the correlations of the design satisfy some Restricted Eigenvalue or Compatibility condition, and provided that the tuning parameter is large enough. Using the two quantities introduced in the paper, we show that the compatibility condition on the design matrix is actually unavoidable to achieve fast prediction rates with the Lasso. The Lasso must incur a loss due to the correlations of the design matrix, measured in terms of the compatibility constant. This results holds for any design matrix, any active subset of covariates, and any tuning parameter. It is now well known that the Lasso enjoys a dimension reduction property: the prediction error is of order $\lambda\sqrt k$ where $k$ is the sparsity; even if the ambient dimension $p$ is much larger than $k$. Such results require that the tuning parameters is greater than some universal threshold. We characterize sharp phase transitions for the tuning parameter of the Lasso around a critical threshold dependent on $k$. If $\lambda$ is equal or larger than this critical threshold, the Lasso is minimax over $k$-sparse target vectors. If $\lambda$ is equal or smaller than critical threshold, the Lasso incurs a loss of order $\sigma\sqrt k$ -- which corresponds to a model of size $k$ -- even if the target vector has fewer than $k$ nonzero coefficients. Remarkably, the lower bounds obtained in the paper also apply to random, data-driven tuning parameters. The results extend to convex penalties beyond the Lasso.

翻译：在过去的二十年中，诸如Lasso、矩阵Lasso和群组Lasso等凸估计器得到了广泛研究，在理论与实践方面均取得了巨大成功。本文引入了两个量——噪声屏障和大尺度偏差，为理解这些凸正则化估计器的性能提供了新的视角。目前普遍认为，只要设计矩阵的相关性满足某些限制特征值或兼容性条件，且调节参数足够大，Lasso就能实现快速的预测速率。通过使用本文提出的两个量，我们证明设计矩阵的兼容性条件实际上是Lasso实现快速预测速率所不可避免的。由于设计矩阵的相关性（以兼容性常数衡量），Lasso必然会产生一定的损失。这一结论适用于任何设计矩阵、任何协变量的活跃子集以及任何调节参数。众所周知，Lasso具有降维特性：即使环境维度p远大于稀疏度k，其预测误差的量级仍为λ√k。此类结果要求调节参数大于某个通用阈值。我们刻画了Lasso调节参数在依赖于k的临界阈值附近的尖锐相变现象：若λ等于或大于该临界阈值，Lasso在k稀疏目标向量上具有极小极大最优性；若λ等于或小于该临界阈值，即使目标向量的非零系数少于k个，Lasso仍会产生量级为σ√k的损失——这相当于一个规模为k的模型所产生的误差。值得注意的是，本文获得的下降界同样适用于随机的、数据驱动的调节参数。这些结果可推广至Lasso之外的凸惩罚方法。