This paper studies an infinite horizon optimal control problem for discrete-time linear system and quadratic criteria, both with random parameters which are independent and identically distributed with respect to time. In this general setting, we apply the policy gradient method, a reinforcement learning technique, to search for the optimal control without requiring knowledge of statistical information of the parameters. We investigate the sub-Gaussianity of the state process and establish global linear convergence guarantee for this approach based on assumptions that are weaker and easier to verify compared to existing results. Numerical experiments are presented to illustrate our result.
翻译:本文研究了一类具有随机参数的离散时间线性系统与二次型准则下的无穷时域最优控制问题,其中参数随时间独立同分布。在该一般性框架下,我们应用强化学习中的策略梯度方法,在无需知晓参数统计信息的情况下搜索最优控制。研究了状态过程的次高斯性,并基于比现有结果更弱且更易验证的假设条件,建立了该方法的全局线性收敛性保证。最后通过数值实验验证了所得结论。