We consider a class of optimization problems on the space of probability measures motivated by the mean-field approach to studying neural networks. Such problems can be solved by constructing continuous-time gradient flows that converge to the minimizer of the energy function under consideration, and then implementing discrete-time algorithms that approximate the flow. In this work, we focus on the Fisher-Rao gradient flow and we construct an interacting particle system that approximates the flow as its mean-field limit. We discuss the connection between the energy function, the gradient flow and the particle system and explain different approaches to smoothing out the energy function with an appropriate kernel in a way that allows for the particle system to be well-defined. We provide a rigorous proof of the existence and uniqueness of thus obtained kernelized flows, as well as a propagation of chaos result that provides a theoretical justification for using the corresponding kernelized particle systems as approximation algorithms in entropic mean-field optimization.
翻译:我们考虑由研究神经网络的平均场方法所驱动的概率测度空间上的一类优化问题。此类问题可通过构建收敛于目标能量函数最小化器的连续时间梯度流,并进而实现近似该流的离散时间算法来求解。本文聚焦于Fisher-Rao梯度流,我们构建了一个以该流为平均场极限的交互粒子系统进行近似。我们探讨了能量函数、梯度流与粒子系统之间的关联,并阐述了通过适当核函数平滑能量函数的不同方法,以确保粒子系统的良好定义。我们严格证明了由此获得的核化流的存在唯一性,并给出了混沌传播结果,这为使用相应的核化粒子系统作为熵平均场优化中的近似算法提供了理论依据。