We study the problem of generating control laws for systems with unknown dynamics. Our approach is to represent the controller and the value function with neural networks, and to train them using loss functions adapted from the Hamilton-Jacobi-Bellman (HJB) equations. In the absence of a known dynamics model, our method first learns the state transitions from data collected by interacting with the system in an offline process. The learned transition function is then integrated to the HJB equations and used to forward simulate the control signals produced by our controller in a feedback loop. In contrast to trajectory optimization methods that optimize the controller for a single initial state, our controller can generate near-optimal control signals for initial states from a large portion of the state space. Compared to recent model-based reinforcement learning algorithms, we show that our method is more sample efficient and trains faster by an order of magnitude. We demonstrate our method in a number of tasks, including the control of a quadrotor with 12 state variables.
翻译:我们研究在未知动力学系统中生成控制律的问题。我们的方法是用神经网络表示控制器和值函数,并使用从Hamilton-Jacobi-Bellman(HJB)方程导出的损失函数进行训练。在缺乏已知动力学模型的情况下,我们的方法首先通过在与系统交互的离线过程中收集的数据学习状态转移。然后将学习到的转移函数集成到HJB方程中,并用于在反馈回路中正向模拟控制器产生的控制信号。与针对单个初始状态优化控制器的轨迹优化方法不同,我们的控制器能够为状态空间中大部分初始状态生成接近最优的控制信号。与近期基于模型的强化学习算法相比,我们证明该方法样本效率更高,训练速度快一个数量级。我们在多个任务中验证了该方法,包括对具有12个状态变量的四旋翼飞行器的控制。