Diagnosing and Augmenting Feature Representations in Correctional Inverse Reinforcement Learning

Robots have been increasingly better at doing tasks for humans by learning from their feedback, but still often suffer from model misalignment due to missing or incorrectly learned features. When the features the robot needs to learn to perform its task are missing or do not generalize well to new settings, the robot will not be able to learn the task the human wants and, even worse, may learn a completely different and undesired behavior. Prior work shows how the robot can detect when its representation is missing some feature and can, thus, ask the human to be taught about the new feature; however, these works do not differentiate between features that are completely missing and those that exist but do not generalize to new environments. In the latter case, the robot would detect misalignment and simply learn a new feature, leading to an arbitrarily growing feature representation that can, in turn, lead to spurious correlations and incorrect learning down the line. In this work, we propose separating the two sources of misalignment: we propose a framework for determining whether a feature the robot needs is incorrectly learned and does not generalize to new environment setups vs. is entirely missing from the robot's representation. Once we detect the source of error, we show how the human can initiate the realignment process for the model: if the feature is missing, we follow prior work for learning new features; however, if the feature exists but does not generalize, we use data augmentation to expand its training and, thus, complete the correction. We demonstrate the proposed approach in experiments with a simulated 7DoF robot manipulator and physical human corrections.

翻译：摘要：机器人通过人类反馈来执行任务的能力日益增强，但常因特征缺失或学习不当导致的模型失配问题而受限。当机器人执行任务所需学习的特征缺失或无法有效泛化至新场景时，其将无法习得人类期望的任务，更甚者可能学习到截然不同的非预期行为。现有研究展示了机器人如何检测其表示中缺失的特征，并据此请求人类教授新特征，但这些工作未能区分特征完全缺失与存在但无法泛化至新环境这两种情况。对于后者，机器人会检测到失配并简单学习新特征，这可能导致特征表示任意增长，进而引发虚假相关性与后续错误学习。本文提出将失配的两类来源分离：我们建立框架以判定机器人所需特征究竟是学习不当且无法泛化至新环境配置，还是完全不存在于其表示中。在检测到误差根源后，我们展示了人类如何启动模型重对齐过程：若特征缺失，则沿用先前工作学习新特征；若特征存在但无法泛化，则通过数据增强扩展其训练以实现修正。我们通过模拟七自由度机器人操作器与物理人类修正实验验证了所提方法。