We consider a finite-horizon Mean Field Control problem for Markovian models. The objective function is composed of a sum of convex and Lipschitz functions taking their values on a space of state-action distributions. We introduce an iterative algorithm which we prove to be a Mirror Descent associated with a non-standard Bregman divergence, having a convergence rate of order 1/ $\sqrt$ K. It requires the solution of a simple dynamic programming problem at each iteration. We compare this algorithm with learning methods for Mean Field Games after providing a reformulation of our control problem as a game problem. These theoretical contributions are illustrated with numerical examples applied to a demand-side management problem for power systems aimed at controlling the average power consumption profile of a population of flexible devices contributing to the power system balance.
翻译:本文考虑马尔可夫模型下的有限时域均值场控制问题。目标函数由一组定义在状态-动作分布空间上的凸函数与Lipschitz函数之和构成。我们提出一种迭代算法,并证明该算法是与非标准Bregman散度相关联的镜像下降法,其收敛速度为 $1/\sqrt{K}$ 量级。该算法每步迭代仅需求解一个简单的动态规划问题。在将控制问题重新表述为博弈问题后,我们将其与均值场博弈的学习方法进行比较。通过应用于电力系统需求侧管理问题的数值实例验证了理论贡献,该问题旨在控制一组柔性设备的平均功率消耗曲线以维持电力系统平衡。