This paper presents our solution for the Real Robot Challenge (RRC) III, a competition featured in the NeurIPS 2022 Competition Track, aimed at addressing dexterous robotic manipulation tasks through learning from pre-collected offline data. Participants were provided with two types of datasets for each task: expert and mixed datasets with varying skill levels. While the simplest offline policy learning algorithm, Behavioral Cloning (BC), performed remarkably well when trained on expert datasets, it outperformed even the most advanced offline reinforcement learning (RL) algorithms. However, BC's performance deteriorated when applied to mixed datasets, and the performance of offline RL algorithms was also unsatisfactory. Upon examining the mixed datasets, we observed that they contained a significant amount of expert data, although this data was unlabeled. To address this issue, we proposed a semi-supervised learning-based classifier to identify the underlying expert behavior within mixed datasets, effectively isolating the expert data. To further enhance BC's performance, we leveraged the geometric symmetry of the RRC arena to augment the training dataset through mathematical transformations. In the end, our submission surpassed that of all other participants, even those who employed complex offline RL algorithms and intricate data processing and feature engineering techniques.
翻译:本文介绍了我们在NeurIPS 2022比赛赛道中的真实机器人挑战赛(RRC)III的解决方案,该比赛旨在通过从预收集的离线数据中学习来处理灵巧的机器人操作任务。参赛者为每个任务提供了两种数据集:专家数据集和包含不同技能水平的混合数据集。尽管最简单的离线策略学习算法——行为克隆(BC)在专家数据集上训练时表现极为出色,甚至超越了最先进的离线强化学习(RL)算法,但当应用于混合数据集时,BC的性能会下降,而离线RL算法的表现同样不理想。通过检查混合数据集,我们发现其中包含了大量未标注的专家数据。为解决这一问题,我们提出了一种基于半监督学习的分类器,用于识别混合数据集中的潜在专家行为,从而有效提取专家数据。为进一步提升BC的性能,我们利用RRC竞技场的几何对称性,通过数学变换增强训练数据集。最终,我们的提交成果超越了所有其他参赛者,包括那些采用复杂离线RL算法以及精细数据预处理和特征工程技术的方法。