We propose a reinforcement learning (RL)-based algorithm to jointly train (1) a trajectory planner and (2) a tracking controller in a layered control architecture. Our algorithm arises naturally from a rewrite of the underlying optimal control problem that lends itself to an actor-critic learning approach. By explicitly learning a \textit{dual} network to coordinate the interaction between the planning and tracking layers, we demonstrate the ability to achieve an effective consensus between the two components, leading to an interpretable policy. We theoretically prove that our algorithm converges to the optimal dual network in the Linear Quadratic Regulator (LQR) setting and empirically validate its applicability to nonlinear systems through simulation experiments on a unicycle model.
翻译:我们提出一种基于强化学习(RL)的算法,用于在分层控制架构中联合训练(1)轨迹规划器与(2)跟踪控制器。该算法源于对底层最优控制问题的重构,这种重构自然适用于Actor-Critic学习方法。通过显式学习一个\textit{对偶}网络来协调规划层与跟踪层之间的交互,我们证明了该方法能使两个组件达成有效共识,从而形成可解释的策略。我们在理论上证明了在线性二次调节器(LQR)设定下,该算法收敛至最优对偶网络,并通过在独轮车模型上的仿真实验,实证验证了其在非线性系统中的适用性。