Meta-learning has been proposed as a promising machine learning topic in recent years, with important applications to image classification, robotics, computer games, and control systems. In this paper, we study the problem of using meta-learning to deal with uncertainty and heterogeneity in ergodic linear quadratic regulators. We integrate the zeroth-order optimization technique with a typical meta-learning method, proposing an algorithm that omits the estimation of policy Hessian, which applies to tasks of learning a set of heterogeneous but similar linear dynamic systems. The induced meta-objective function inherits important properties of the original cost function when the set of linear dynamic systems are meta-learnable, allowing the algorithm to optimize over a learnable landscape without projection onto the feasible set. We provide a convergence result for the exact gradient descent process by analyzing the boundedness and smoothness of the gradient for the meta-objective, which justify the proposed algorithm with gradient estimation error being small. We also provide a numerical example to corroborate this perspective.
翻译:元学习近年来已成为机器学习领域的一个重要研究方向,在图像分类、机器人学、计算机游戏和控制系统等方面具有重要应用。本文研究利用元学习处理遍历线性二次调节器中不确定性与异质性的问题。我们将零阶优化技术与典型元学习方法相结合,提出一种省略策略海森矩阵估计的算法,该算法适用于学习一组异质但相似的线性动态系统的任务。当该组线性动态系统具备元学习性时,导出的元目标函数继承了原始代价函数的重要性质,使得算法可在可学习的优化曲面上进行优化,而无需投影到可行集。通过分析元目标函数梯度的有界性与光滑性,我们为精确梯度下降过程提供了收敛性证明,该证明在梯度估计误差较小时验证了所提算法的有效性。我们还通过数值算例验证了这一观点。