Robots should be able to learn complex behaviors from human demonstrations. In practice, these human-provided datasets are inevitably imbalanced: i.e., the human demonstrates some subtasks more frequently than others. State-of-the-art methods default to treating each element of the human's dataset as equally important. So if -- for instance -- the majority of the human's data focuses on reaching a goal, and only a few state-action pairs move to avoid an obstacle, the learning algorithm will place greater emphasis on goal reaching. More generally, misalignment between the relative amounts of data and the importance of that data causes fundamental problems for imitation learning approaches. In this paper we analyze and develop learning methods that automatically account for mixed datasets. We formally prove that imbalanced data leads to imbalanced policies when each state-action pair is weighted equally; these policies emulate the most represented behaviors, and not the human's complex, multi-task demonstrations. We next explore algorithms that rebalance offline datasets (i.e., reweight the importance of different state-action pairs) without human oversight. Reweighting the dataset can enhance the overall policy performance. However, there is no free lunch: each method for autonomously rebalancing brings its own pros and cons. We formulate these advantages and disadvantages, helping other researchers identify when each type of approach is most appropriate. We conclude by introducing a novel meta-gradient rebalancing algorithm that addresses the primary limitations behind existing approaches. Our experiments show that dataset rebalancing leads to better downstream learning, improving the performance of general imitation learning algorithms without requiring additional data collection. See our project website: https://collab.me.vt.edu/data_curation/.
翻译:机器人应能从人类演示中学习复杂行为。实践中,这些由人类提供的数据集不可避免地存在不平衡性:即人类对某些子任务的演示频率高于其他任务。现有先进方法默认将人类数据集中的每个元素视为同等重要。因此,例如,若人类数据大部分集中于抵达目标,仅少数状态-动作对涉及避障操作,学习算法将更侧重目标抵达行为。更普遍而言,数据量相对比例与其重要性之间的错位,会导致模仿学习方法产生根本性问题。本文分析并开发了能自动处理混合数据集的学习方法。我们严格证明了当每个状态-动作对被赋予相同权重时,不平衡数据将导致不平衡策略;这些策略会模仿出现频率最高的行为,而非人类复杂的多任务演示。我们进一步探索了无需人工干预即可重新平衡离线数据集(即调整不同状态-动作对重要性权重)的算法。数据集的重新加权能提升整体策略性能。然而,不存在万能方案:每种自主重平衡方法皆有其利弊。我们系统阐述了这些优缺点,以帮助其他研究者判断各类方法的适用场景。最后,我们提出一种新型元梯度重平衡算法,以解决现有方法的主要局限。实验表明,数据集重平衡能提升下游学习效果,在无需额外数据收集的情况下改进通用模仿学习算法的性能。项目网站详见:https://collab.me.vt.edu/data_curation/。