Second-order information -- such as curvature or data covariance -- is critical for optimisation, diagnostics, and robustness. However, in many modern settings, only the gradients are observable. We show that the gradients alone can reveal the Hessian, equalling the data covariance $Σ$ for the linear regression. Our key insight is a simple variance calibration: injecting Gaussian noise so that the total target noise variance equals the batch size ensures that the empirical gradient covariance closely approximates the Hessian, even when evaluated far from the optimum. We provide non-asymptotic operator-norm guarantees under sub-Gaussian inputs. We also show that without such calibration, recovery can fail by an $Ω(1)$ factor. The proposed method is practical (a "set target-noise variance to $n$" rule) and robust (variance $\mathcal{O}(n)$ suffices to recover $Σ$ up to scale). Applications include preconditioning for faster optimisation, adversarial risk estimation, and gradient-only training, for example, in distributed systems. We support our theoretical results with experiments on synthetic and real data.
翻译:二阶信息——如曲率或数据协方差——对于优化、诊断和鲁棒性至关重要。然而,在许多现代场景中,只有梯度是可观测的。我们证明,仅凭梯度即可揭示Hessian矩阵,在线性回归中其等于数据协方差$Σ$。我们的核心见解是一种简单的方差校准方法:注入高斯噪声,使得总目标噪声方差等于批量大小,这确保了经验梯度协方差能够紧密逼近Hessian矩阵,即使在远离最优解处评估时也是如此。我们在亚高斯输入下提供了非渐近算子范数保证。我们还证明,若无此类校准,恢复可能产生$Ω(1)$因子的误差。所提出的方法具有实用性(遵循“将目标噪声方差设为$n$”的规则)且鲁棒性强(方差为$\mathcal{O}(n)$即足以按比例恢复$Σ$)。应用包括用于加速优化的预条件处理、对抗风险估计以及仅使用梯度的训练(例如在分布式系统中)。我们通过合成数据和真实数据的实验支持了理论结果。