We propose and demonstrate a compositional framework for training and verifying reinforcement learning (RL) systems within a multifidelity sim-to-real pipeline, in order to deploy reliable and adaptable RL policies on physical hardware. By decomposing complex robotic tasks into component subtasks and defining mathematical interfaces between them, the framework allows for the independent training and testing of the corresponding subtask policies, while simultaneously providing guarantees on the overall behavior that results from their composition. By verifying the performance of these subtask policies using a multifidelity simulation pipeline, the framework not only allows for efficient RL training, but also for a refinement of the subtasks and their interfaces in response to challenges arising from discrepancies between simulation and reality. In an experimental case study we apply the framework to train and deploy a compositional RL system that successfully pilots a Warthog unmanned ground robot.
翻译:我们提出并展示了一种组合式框架,用于在多保真仿真到现实流水线中训练和验证强化学习系统,从而在物理硬件上部署可靠且适应性强的强化学习策略。通过将复杂机器人任务分解为组件子任务,并在它们之间定义数学接口,该框架允许独立训练和测试相应的子任务策略,同时为这些子任务组合所产生的整体行为提供保障。通过利用多保真仿真流水线验证这些子任务策略的性能,该框架不仅支持高效的强化学习训练,还能针对仿真与现实之间差异所引发的挑战,对子任务及其接口进行精细化调整。在一项实验案例研究中,我们应用该框架训练并部署了一个组合式强化学习系统,该系统成功操控了一辆Warthog无人地面机器人。