Scaling Effects and Uncertainty Quantification in Neural Actor Critic Algorithms

We investigate the neural Actor Critic algorithm using shallow neural networks for both the Actor and Critic models. The focus of this work is twofold: first, to compare the convergence properties of the network outputs under various scaling schemes as the network width and the number of training steps tend to infinity; and second, to provide precise control of the approximation error associated with each scaling regime. Previous work has shown convergence to ordinary differential equations with random initial conditions under inverse square root scaling in the network width. In this work, we shift the focus from convergence speed alone to a more comprehensive statistical characterization of the algorithm's output, with the goal of quantifying uncertainty in neural Actor Critic methods. Specifically, we study a general inverse polynomial scaling in the network width, with an exponent treated as a tunable hyperparameter taking values strictly between one half and one. We derive an asymptotic expansion of the network outputs, interpreted as statistical estimators, in order to clarify their structure. To leading order, we show that the variance decays as a power of the network width, with an exponent equal to one half minus the scaling parameter, implying improved statistical robustness as the scaling parameter approaches one. Numerical experiments support this behavior and further suggest faster convergence for this choice of scaling. Finally, our analysis yields concrete guidelines for selecting algorithmic hyperparameters, including learning rates and exploration rates, as functions of the network width and the scaling parameter, ensuring provably favorable statistical behavior.

翻译：本研究采用浅层神经网络分别构建Actor与Critic模型，对神经Actor-Critic算法进行理论分析。本工作的重点包含两个方面：首先，在网络宽度与训练步数趋于无穷大的条件下，比较不同尺度化方案下网络输出的收敛特性；其次，精确控制各尺度化机制对应的近似误差。先前研究表明，在网络宽度采用平方根倒数尺度化时，算法会收敛至具有随机初始条件的常微分方程。本工作将研究重点从单纯的收敛速度转向对算法输出更全面的统计刻画，旨在量化神经Actor-Critic方法中的不确定性。具体而言，我们研究网络宽度的一般逆多项式尺度化方案，其中指数作为可调超参数严格取在二分之一与一之间。通过将网络输出解释为统计估计量，我们推导其渐近展开式以阐明其结构。在主导阶上，我们证明方差随网络宽度呈幂律衰减，衰减指数等于二分之一减去尺度参数，这意味着当尺度参数趋近于一时，统计鲁棒性将得到改善。数值实验支持该结论，并进一步表明该尺度化选择能带来更快的收敛速度。最后，我们的分析为算法超参数（包括学习率与探索率）的选择提供了具体指导原则：这些参数应作为网络宽度与尺度参数的函数进行设置，从而确保可证明的优良统计特性。