Weak Convergence Analysis of Online Neural Actor-Critic Algorithms

We prove that a single-layer neural network trained with the online actor critic algorithm converges in distribution to a random ordinary differential equation (ODE) as the number of hidden units and the number of training steps $\rightarrow \infty$. In the online actor-critic algorithm, the distribution of the data samples dynamically changes as the model is updated, which is a key challenge for any convergence analysis. We establish the geometric ergodicity of the data samples under a fixed actor policy. Then, using a Poisson equation, we prove that the fluctuations of the model updates around the limit distribution due to the randomly-arriving data samples vanish as the number of parameter updates $\rightarrow \infty$. Using the Poisson equation and weak convergence techniques, we prove that the actor neural network and critic neural network converge to the solutions of a system of ODEs with random initial conditions. Analysis of the limit ODE shows that the limit critic network will converge to the true value function, which will provide the actor an asymptotically unbiased estimate of the policy gradient. We then prove that the limit actor network will converge to a stationary point.

翻译：我们证明，当隐藏单元数量和训练步数趋于无穷时，采用在线演员-评论家算法训练的单层神经网络在分布上收敛于一个随机常微分方程（ODE）。在在线演员-评论家算法中，随着模型更新，数据样本的分布动态变化，这是收敛分析的关键挑战。我们建立了固定演员策略下数据样本的几何遍历性。接着，利用泊松方程，我们证明由随机到达的数据样本引起的模型更新围绕极限分布的波动随参数更新次数趋于无穷而消失。通过泊松方程和弱收敛技术，我们证明演员神经网络和评论家神经网络收敛于具有随机初始条件的常微分方程组的解。对极限常微分方程的分析表明，极限评论家网络将收敛于真实值函数，从而为演员提供策略梯度的渐近无偏估计。我们进一步证明极限演员网络将收敛于驻点。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日