Humans use social context to specify preferences over behaviors, i.e. their reward functions. Yet, algorithms for inferring reward models from preference data do not take this social learning view into account. Inspired by pragmatic human communication, we study how to extract fine-grained data regarding why an example is preferred that is useful for learning more accurate reward models. We propose to enrich binary preference queries to ask both (1) which features of a given example are preferable in addition to (2) comparisons between examples themselves. We derive an approach for learning from these feature-level preferences, both for cases where users specify which features are reward-relevant, and when users do not. We evaluate our approach on linear bandit settings in both vision- and language-based domains. Results support the efficiency of our approach in quickly converging to accurate rewards with fewer comparisons vs. example-only labels. Finally, we validate the real-world applicability with a behavioral experiment on a mushroom foraging task. Our findings suggest that incorporating pragmatic feature preferences is a promising approach for more efficient user-aligned reward learning.
翻译:人类利用社会背景来指定对行为的偏好,即其奖励函数。然而,从偏好数据推断奖励模型的算法并未考虑这种社会学习视角。受实用人类交流的启发,我们研究如何提取关于为何偏好某个示例的细粒度数据,以用于学习更准确的奖励模型。我们提出通过丰富二元偏好查询,同时询问(1)给定示例的哪些特征更可取,以及(2)示例之间的比较。我们推导出一种从这些特征级偏好中学习的方法,既适用于用户指定哪些特征与奖励相关的情况,也适用于用户未指定的情况。我们在视觉和语言领域的线性赌博机设置中评估了我们的方法。结果表明,与仅使用示例标签相比,我们的方法能以更少的比较次数快速收敛到准确的奖励,具有更高的效率。最后,我们通过一项蘑菇采集任务的行为实验验证了其现实适用性。我们的发现表明,融入实用特征偏好是实现更高效用户对齐奖励学习的一种有前景的途径。