In this paper, we address the issue of fairness in preference-based reinforcement learning (PbRL) in the presence of multiple objectives. The main objective is to design control policies that can optimize multiple objectives while treating each objective fairly. Toward this objective, we design a new fairness-induced preference-based reinforcement learning or FPbRL. The main idea of FPbRL is to learn vector reward functions associated with multiple objectives via new welfare-based preferences rather than reward-based preference in PbRL, coupled with policy learning via maximizing a generalized Gini welfare function. Finally, we provide experiment studies on three different environments to show that the proposed FPbRL approach can achieve both efficiency and equity for learning effective and fair policies.
翻译:本文探讨了多目标情境下基于偏好的强化学习(PbRL)中的公平性问题。核心目标是设计能够优化多个目标且对各目标保持公平的控制策略。为此,我们提出了一种新的公平诱导型基于偏好的强化学习方法(FPbRL)。FPbRL的核心思想在于:通过基于福利偏好的新机制(而非传统PbRL中的奖励偏好)学习与多个目标相关的向量奖励函数,并结合基于广义基尼福利函数最大化的策略学习。最后,我们在三个不同环境中开展的实验研究表明,所提出的FPbRL方法能够在高效性与公平性之间取得平衡,从而学习到有效且公平的策略。