We propose a framework for the design of feedback controllers that combines the optimization-driven and model-free advantages of deep reinforcement learning with the stability guarantees provided by using the Youla-Kucera parameterization to define the search domain. Recent advances in behavioral systems allow us to construct a data-driven internal model; this enables an alternative realization of the Youla-Kucera parameterization based entirely on input-output exploration data. Using a neural network to express a parameterized set of nonlinear stable operators enables seamless integration with standard deep learning libraries. We demonstrate the approach on a realistic simulation of a two-tank system.
翻译:我们提出一种将深度强化学习的优化驱动与无模型优势同Youla-Kucera参数化稳定性保证相结合的反馈控制器设计框架。行为系统领域的最新进展使我们能够构建数据驱动的内模,这实现了完全基于输入-输出探索数据的Youla-Kucera参数化替代方案。通过使用神经网络表达参数化非线性稳定算子集合,可实现与标准深度学习库的无缝集成。我们通过双罐系统的真实仿真验证了该方法的有效性。