We consider the problem of optimising the expected value of a loss functional over a nonlinear model class of functions, assuming that we have only access to realisations of the gradient of the loss. This is a classical task in statistics, machine learning and physics-informed machine learning. A straightforward solution is to replace the exact objective with a Monte Carlo estimate before employing standard first-order methods like gradient descent, which yields the classical stochastic gradient descent method. But replacing the true objective with an estimate ensues a ``generalisation error''. Rigorous bounds for this error typically require strong compactness and Lipschitz continuity assumptions while providing a very slow decay with sample size. We propose a different optimisation strategy relying on a natural gradient descent in which the true gradient is approximated in local linearisations of the model class via (quasi-)projections based on optimal sampling methods. Under classical assumptions on the loss and the nonlinear model class, we prove that this scheme converges almost surely monotonically to a stationary point of the true objective and we provide convergence rates.
翻译:我们考虑在仅能获取损失函数梯度实现的情况下,优化非线性模型类上损失泛函期望值的问题。这是统计学、机器学习和物理信息机器学习中的经典任务。一种直接解法是先用蒙特卡洛估计替代精确目标函数,再采用梯度下降等标准一阶方法,由此得到经典的随机梯度下降法。但用估计值替代真实目标会引入“泛化误差”。该误差的严格界通常需要强紧性与Lipschitz连续性假设,且随样本量的衰减速度极慢。我们提出一种基于自然梯度下降的替代优化策略,在该策略中,通过基于最优采样方法的(准)投影在模型类的局部线性化中近似真实梯度。在损失函数与非线性模型类的经典假设下,我们证明该方案几乎必然单调收敛至真实目标函数的驻点,并给出收敛速率。