Functional optimization problems are typically solved by optimizing the parameters of a fixed representation, such as a neural network, resulting in highly nonconvex losses that complicate both training and theoretical analysis. An interesting alternative is functional gradient descent (FGD), that is, gradient descent directly in function space, which benefits from strong convergence results and admits a clean theory. However, FGD is difficult to implement in practice because functional gradients are infinite-dimensional, and thus cannot be fully computed nor stored in memory. Existing implementations therefore rely on fixed approximations, which introduce approximation error. We propose a new, theoretically-grounded FGD algorithm that adapts the representation of the functional gradients over the course of optimization. By explicitly incorporating this approximation into the analysis, we establish convergence to a stationary point (for smooth losses) and to a global minimizer (under smoothness + a Polyak-Lojasiewicz-type condition) regardless of our approximations. To the best of our knowledge, this is the first implementable FGD method with such guarantees in a general setting. We demonstrate the effectiveness of our method on regression, numerical solution of PDEs, and modern computer vision. Across settings, our method consistently outperforms both FGD with fixed approximations and neural network baselines in efficiency and accuracy.
翻译:泛函优化问题通常通过优化固定表示(如神经网络)的参数来求解,这会导致高度非凸的损失函数,从而复杂化训练和理论分析。一种有趣的替代方案是泛函梯度下降,即直接在函数空间中进行梯度下降,该方法得益于强收敛结果并具有简洁的理论框架。然而,FGD在实践中难以实现,因为泛函梯度是无限维的,因此既无法完全计算也无法存储在内存中。现有实现依赖于固定近似,这引入了近似误差。我们提出了一种新的、有理论依据的FGD算法,该算法在优化过程中自适应地调整泛函梯度的表示。通过将这种近似显式地纳入分析,我们证明了无论采用何种近似,算法都能收敛到驻点(对于光滑损失)和全局最小值(在光滑性+Polyak-Lojasiewicz型条件下)。据我们所知,这是首个在一般设置下具有此类保证的可实现FGD方法。我们在回归、偏微分方程数值求解和现代计算机视觉问题中展示了该方法的有效性。在各种场景下,我们的方法在效率和精度上均始终优于固定近似的FGD和神经网络基线。