The contribution of this paper is a generalized formulation of correctional learning using optimal transport, which is about how to optimally transport one mass distribution to another. Correctional learning is a framework developed to enhance the accuracy of parameter estimation processes by means of a teacher-student approach. In this framework, an expert agent, referred to as the teacher, modifies the data used by a learning agent, known as the student, to improve its estimation process. The objective of the teacher is to alter the data such that the student's estimation error is minimized, subject to a fixed intervention budget. Compared to existing formulations of correctional learning, our novel optimal transport approach provides several benefits. It allows for the estimation of more complex characteristics as well as the consideration of multiple intervention policies for the teacher. We evaluate our approach on two theoretical examples, and on a human-robot interaction application in which the teacher's role is to improve the robots performance in an inverse reinforcement learning setting.
翻译:本文的贡献在于提出了一种基于最优输运的矫正学习广义形式化方法,该方法研究如何最优地将一种质量分布输运至另一种质量分布。矫正学习是一种通过教师-学生框架提升参数估计过程精度的学习范式。在该框架中,专家智能体(即教师)通过修改学习智能体(即学生)所使用的数据来优化其估计过程。教师的目标是在固定干预预算的约束下调整数据,使得学生的估计误差最小化。与现有矫正学习形式化方法相比,我们提出的新型最优输运方法具有若干优势:它既支持对更复杂特征进行估计,又能使教师实施多种干预策略。我们通过两个理论示例以及一项人机交互应用来评估该方法,在该应用中教师的核心任务是通过逆强化学习机制改善机器人的表现。