This report introduces our winning solution of the real-robot phase of the Real Robot Challenge (RRC) 2022. The goal of this year's challenge is to solve dexterous manipulation tasks with offline reinforcement learning (RL) or imitation learning. To this end, participants are provided with datasets containing dozens of hours of robotic data. For each task an expert and a mixed dataset are provided. In our experiments, when learning from the expert datasets, we find standard Behavioral Cloning (BC) outperforms state-of-the-art offline RL algorithms. When learning from the mixed datasets, BC performs poorly, as expected, while surprisingly offline RL performs suboptimally, failing to match the average performance of the baseline model used for collecting the datasets. To remedy this, motivated by the strong performance of BC on the expert datasets we elect to use a semi-supervised classification technique to filter the subset of expert data out from the mixed datasets, and subsequently perform BC on this extracted subset of data. To further improve results, in all settings we use a simple data augmentation method that exploits the geometric symmetry of the RRC physical robotic environment. Our submitted BC policies each surpass the mean return of their respective raw datasets, and the policies trained on the filtered mixed datasets come close to matching the performances of those trained on the expert datasets.
翻译:本报告介绍了我们在2022年真实机器人挑战赛(RRC)实体机器人阶段提出的获胜方案。今年的挑战目标是通过离线强化学习或模仿学习解决灵巧操作任务。为此,主办方提供了包含数十小时机器人数据的数据集,每个任务均配备专家数据集和混合数据集。实验中发现,当使用专家数据集进行学习时,标准行为克隆方法的表现优于最先进的离线强化学习算法。而在使用混合数据集时,虽然行为克隆方法效果不佳(符合预期),但令人意外的是,离线强化学习方法的次优表现甚至无法达到用于采集数据集的基线模型的平均性能。为解决这一问题,受专家数据集上行为克隆方法优异表现的启发,我们采用半监督分类技术从混合数据集中筛选专家数据子集,并在此子集上实施行为克隆。为进一步提升效果,我们在所有设置中均采用了一种利用RRC物理机器人环境几何对称性的简单数据增强方法。提交的行为克隆策略均超越了各自原始数据集的平均奖励值,且在筛选后的混合数据集上训练的策略性能已接近专家数据集训练的策略。