In this paper, we address the issue of fairness in preference-based reinforcement learning (PbRL) in the presence of multiple objectives. The main objective is to design control policies that can optimize multiple objectives while treating each objective fairly. Toward this objective, we design a new fairness-induced preference-based reinforcement learning or FPbRL. The main idea of FPbRL is to learn vector reward functions associated with multiple objectives via new welfare-based preferences rather than reward-based preference in PbRL, coupled with policy learning via maximizing a generalized Gini welfare function. Finally, we provide experiment studies on three different environments to show that the proposed FPbRL approach can achieve both efficiency and equity for learning effective and fair policies.
翻译:本文研究了多目标情境下基于偏好的强化学习(PbRL)中的公平性问题。主要目标是设计能够优化多个目标且公平对待每个目标的控制策略。为此,我们提出了一种新的公平性诱导偏好强化学习(FPbRL)方法。FPbRL的核心思想是通过基于福利的新偏好(而非PbRL中基于奖励的偏好)学习与多目标相关的向量奖励函数,并结合通过最大化广义基尼福利函数进行策略学习。最后,我们在三个不同环境中开展了实验研究,结果表明所提出的FPbRL方法能够同时实现有效性与公平性,从而习得高效且公平的策略。