In recent times, reinforcement learning has produced baffling results when it comes to performing control tasks with highly non-linear systems. The impressive results always outweigh the potential vulnerabilities or uncertainties associated with the agents when deployed in the real-world. While the performance is remarkable compared to the classical control algorithms, the reinforcement learning-based methods suffer from two flaws, robustness and interpretability, which are vital for contemporary real-world applications. The paper attempts to alleviate such problems with reinforcement learning and proposes the concept of model-assisted reinforcement learning to induce a notion of conservativeness in the agents. The control task considered for the experiment involves navigating a CrazyFlie quadrotor. The paper also describes a way of reformulating the task to have the flexibility of tuning the level of conservativeness via multi-objective reinforcement learning. The results include a comparison of the vanilla reinforcement learning approaches and the proposed approach. The metrics are evaluated by systematically injecting disturbances to classify the inherent robustness and conservativeness of the agents. More concrete arguments are made by computing and comparing the backward reachability tubes of the RL policies by solving the Hamilton-Jacobi-Bellman partial differential equation (HJ PDE).
翻译:近年来,强化学习在处理高度非线性系统的控制任务时取得了令人瞩目的成果。这些卓越表现往往掩盖了智能体在实际部署中可能存在的脆弱性或不确定性。尽管强化学习方法的性能相较于经典控制算法更为出色,但其在鲁棒性和可解释性方面存在两个关键缺陷,这对当代实际应用至关重要。本文尝试缓解强化学习中的这些问题,并提出模型辅助强化学习的概念,以在智能体中引入保守性概念。实验所考虑的控制任务涉及一架CrazyFlie四旋翼飞行器的导航。本文还描述了一种重新定义任务的方法,通过多目标强化学习实现保守性水平的灵活调节。结果包括对原始强化学习方法与所提出方法的比较。通过系统注入扰动来评估智能体的内在鲁棒性和保守性指标。并通过求解Hamilton-Jacobi-Bellman偏微分方程(HJ PDE)计算并比较强化学习策略的后向可达管,从而提出更具体的论证。