Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification

Inverse reinforcement learning (IRL) aims to infer an agent's preferences (represented as a reward function $R$) from their behaviour (represented as a policy $\pi$). To do this, we need a behavioural model of how $\pi$ relates to $R$. In the current literature, the most common behavioural models are optimality, Boltzmann-rationality, and causal entropy maximisation. However, the true relationship between a human's preferences and their behaviour is much more complex than any of these behavioural models. This means that the behavioural models are misspecified, which raises the concern that they may lead to systematic errors if applied to real data. In this paper, we analyse how sensitive the IRL problem is to misspecification of the behavioural model. Specifically, we provide necessary and sufficient conditions that completely characterise how the observed data may differ from the assumed behavioural model without incurring an error above a given threshold. In addition to this, we also characterise the conditions under which a behavioural model is robust to small perturbations of the observed policy, and we analyse how robust many behavioural models are to misspecification of their parameter values (such as e.g.\ the discount rate). Our analysis suggests that the IRL problem is highly sensitive to misspecification, in the sense that very mild misspecification can lead to very large errors in the inferred reward function.

翻译：逆向强化学习（IRL）旨在从智能体的行为（表示为策略$\pi$）推断其偏好（表示为奖励函数$R$）。为此，我们需要一个行为模型来解释$\pi$与$R$之间的关系。在现有文献中，最常用的行为模型包括最优性、玻尔兹曼理性以及因果熵最大化。然而，人类偏好与其行为之间的真实关系远比这些行为模型复杂得多。这意味着行为模型存在误设，若将其应用于真实数据，可能引发系统性误差。本文分析了IRL问题对行为模型误设的敏感性。具体而言，我们给出了充要条件，完整刻画了观测数据与假设行为模型之间的差异在多大范围内不会导致误差超过给定阈值。此外，我们还刻画了行为模型对观测策略微小扰动具有鲁棒性的条件，并分析了多种行为模型对其参数值（如折现率等）误设的鲁棒性。分析表明，IRL问题对模型误设高度敏感——极轻微的模型误设即可能导致推断出的奖励函数产生极大误差。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/