Data Quality in Imitation Learning

In supervised learning, the question of data quality and curation has been over-shadowed in recent years by increasingly more powerful and expressive models that can ingest internet-scale data. However, in offline learning for robotics, we simply lack internet scale data, and so high quality datasets are a necessity. This is especially true in imitation learning (IL), a sample efficient paradigm for robot learning using expert demonstrations. Policies learned through IL suffer from state distribution shift at test time due to compounding errors in action prediction, which leads to unseen states that the policy cannot recover from. Instead of designing new algorithms to address distribution shift, an alternative perspective is to develop new ways of assessing and curating datasets. There is growing evidence that the same IL algorithms can have substantially different performance across different datasets. This calls for a formalism for defining metrics of "data quality" that can further be leveraged for data curation. In this work, we take the first step toward formalizing data quality for imitation learning through the lens of distribution shift: a high quality dataset encourages the policy to stay in distribution at test time. We propose two fundamental properties that shape the quality of a dataset: i) action divergence: the mismatch between the expert and learned policy at certain states; and ii) transition diversity: the noise present in the system for a given state and action. We investigate the combined effect of these two key properties in imitation learning theoretically, and we empirically analyze models trained on a variety of different data sources. We show that state diversity is not always beneficial, and we demonstrate how action divergence and transition diversity interact in practice.

翻译：在监督学习中，数据质量与数据整理的问题近年来已被能够处理互联网规模数据的日益强大且具表达力的模型所掩盖。然而，在机器人离线学习领域，我们缺乏互联网规模的数据，因此高质量数据集是必需的。这一点在模仿学习中尤为突出——这是一种利用专家示范实现机器人学习的样本高效范式。通过模仿学习习得的策略在测试时会因动作预测的累积误差而遭受状态分布偏移，从而导致策略陷入无法恢复的未见状态。与其设计新算法来处理分布偏移，另一种视角是开发评估和整理数据集的新方法。越来越多的证据表明，相同的模仿学习算法在不同数据集上的表现可能差异显著。这需要为定义"数据质量"指标建立形式化体系，进而用于数据整理。在本工作中，我们首次尝试通过分布偏移的视角来形式化模仿学习中的数据质量：高质量数据集能促使策略在测试时保持分布内状态。我们提出塑造数据集质量的两个基本属性：i) 动作散度：专家策略与学习策略在特定状态下的不匹配程度；ii) 转移多样性：给定状态和动作下系统存在的噪声。我们从理论层面研究这两个关键属性在模仿学习中的联合效应，并基于多种数据源训练的模型进行实证分析。研究表明状态多样性并非总是有益的，同时我们演示了动作散度与转移多样性在实践中如何相互作用。