Policies often fail due to distribution shift -- changes in the state and reward that occur when a policy is deployed in new environments. Data augmentation can increase robustness by making the model invariant to task-irrelevant changes in the agent's observation. However, designers don't know which concepts are irrelevant a priori, especially when different end users have different preferences about how the task is performed. We propose an interactive framework to leverage feedback directly from the user to identify personalized task-irrelevant concepts. Our key idea is to generate counterfactual demonstrations that allow users to quickly identify possible task-relevant and irrelevant concepts. The knowledge of task-irrelevant concepts is then used to perform data augmentation and thus obtain a policy adapted to personalized user objectives. We present experiments validating our framework on discrete and continuous control tasks with real human users. Our method (1) enables users to better understand agent failure, (2) reduces the number of demonstrations required for fine-tuning, and (3) aligns the agent to individual user task preferences.
翻译:策略常因分布偏移而失效——即当策略部署到新环境时,状态与奖励函数发生变化。数据增强可通过使模型对智能体观测中任务无关的变化保持不变性来提升鲁棒性。然而,设计者无法预知哪些概念是任务无关的,尤其当不同终端用户对任务执行方式存在个性化偏好时。本文提出一种交互式框架,通过直接利用用户反馈来识别个性化的任务无关概念。其核心思想是生成反事实演示,使用户能快速识别可能的相关与无关概念。任务无关概念的知识随后被用于执行数据增强,从而获得适配用户个性化目标的策略。我们通过真实用户在离散与连续控制任务上的实验验证了该框架:(1)使用户能更好理解智能体故障原因;(2)减少微调所需的演示数量;(3)使智能体与用户的个性化任务偏好对齐。