Data generation and labeling are often expensive in robot learning. Preference-based learning is a concept that enables reliable labeling by querying users with preference questions. Active querying methods are commonly employed in preference-based learning to generate more informative data at the expense of parallelization and computation time. In this paper, we develop a set of novel algorithms, batch active preference-based learning methods, that enable efficient learning of reward functions using as few data samples as possible while still having short query generation times and also retaining parallelizability. We introduce a method based on determinantal point processes (DPP) for active batch generation and several heuristic-based alternatives. Finally, we present our experimental results for a variety of robotics tasks in simulation. Our results suggest that our batch active learning algorithm requires only a few queries that are computed in a short amount of time. We showcase one of our algorithms in a study to learn human users' preferences.
翻译:数据生成和标注在机器人学习中通常代价高昂。基于偏好的学习是一种通过向用户询问偏好问题来实现可靠标注的概念。主动查询方法常用于基于偏好的学习,以生成更具信息性的数据,但代价是牺牲并行性和计算时间。本文提出了一系列新颖算法——批量主动偏好学习方法,能够在尽可能少的数据样本下高效学习奖励函数,同时依然保持较短的查询生成时间和并行性。我们引入了一种基于行列式点过程(DPP)的批量主动生成方法,以及若干启发式替代方案。最后,我们展示了在模拟环境中多种机器人任务的实验结果。结果表明,我们的批量主动学习算法仅需少量查询,且计算时间极短。我们还在一次学习人类用户偏好的研究中展示了其中一种算法。