Model-based reinforcement learning has attracted much attention due to its high sample efficiency and is expected to be applied to real-world robotic applications. In the real world, as unobservable disturbances can lead to unexpected situations, robot policies should be taken to improve not only control performance but also robustness. Adversarial learning is an effective way to improve robustness, but excessive adversary would increase the risk of malfunction, and make the control performance too conservative. Therefore, this study addresses a new adversarial learning framework to make reinforcement learning robust moderately and not conservative too much. To this end, the adversarial learning is first rederived with variational inference. In addition, light robustness, which allows for maximizing robustness within an acceptable performance degradation, is utilized as a constraint. As a result, the proposed framework, so-called LiRA, can automatically adjust adversary level, balancing robustness and conservativeness. The expected behaviors of LiRA are confirmed in numerical simulations. In addition, LiRA succeeds in learning a force-reactive gait control of a quadrupedal robot only with real-world data collected less than two hours.
翻译:模型强化学习因其高样本效率而备受关注,有望应用于现实世界的机器人任务。在现实世界中,由于不可观测的干扰可能导致意外情况,机器人策略不仅需要提升控制性能,还需增强鲁棒性。对抗学习是提高鲁棒性的有效方法,但过强的对抗性会增加故障风险,并使控制性能过于保守。因此,本研究提出一种新的对抗学习框架,旨在使强化学习在适度鲁棒的同时避免过度保守。为此,首先通过变分推断重新推导对抗学习过程。此外,引入轻量鲁棒性作为约束条件,即在可接受的性能衰减范围内最大化鲁棒性。由此提出的框架(称为LiRA)能够自动调节对抗强度,平衡鲁棒性与保守性。数值仿真验证了LiRA的预期行为。此外,LiRA仅利用不足两小时采集的现实世界数据,成功实现了四足机器人的力反应步态控制学习。