We consider standard gradient descent, gradient flow and conjugate gradients as iterative algorithms for minimising a penalised ridge criterion in linear regression. While it is well known that conjugate gradients exhibit fast numerical convergence, the statistical properties of their iterates are more difficult to assess due to inherent non-linearities and dependencies. On the other hand, standard gradient flow is a linear method with well-known regularising properties when stopped early. By an explicit non-standard error decomposition we are able to bound the prediction error for conjugate gradient iterates by a corresponding prediction error of gradient flow at transformed iteration indices. This way, the risk along the entire regularisation path of conjugate gradient iterations can be compared to that for regularisation paths of standard linear methods like gradient flow and ridge regression. In particular, the oracle conjugate gradient iterate shares the optimality properties of the gradient flow and ridge regression oracles up to a constant factor. Numerical examples show the similarity of the regularisation paths in practice.
翻译:本文考虑将标准梯度下降、梯度流和共轭梯度作为线性回归中最小化惩罚岭准则的迭代算法。尽管共轭梯度以其快速数值收敛性而广为人知,但由于其固有的非线性和依赖性,其迭代的统计性质较难评估。另一方面,标准梯度流是一种线性方法,在提前停止时具有已知的正则化特性。通过一种显式的非标准误差分解,我们能够将共轭梯度迭代的预测误差,用梯度流在变换迭代索引处的对应预测误差进行界定。由此,共轭梯度迭代沿整个正则化路径的风险,可与标准线性方法(如梯度流和岭回归)的正则化路径风险进行比较。特别地,最优共轭梯度迭代与梯度流和岭回归最优解具有相同的优化特性,仅相差一个常数因子。数值算例展示了实际中正则化路径的相似性。