Intersectional Fairness in Reinforcement Learning with Large State and Constraint Spaces

In traditional reinforcement learning (RL), the learner aims to solve a single objective optimization problem: find the policy that maximizes expected reward. However, in many real-world settings, it is important to optimize over multiple objectives simultaneously. For example, when we are interested in fairness, states might have feature annotations corresponding to multiple (intersecting) demographic groups to whom reward accrues, and our goal might be to maximize the reward of the group receiving the minimal reward. In this work, we consider a multi-objective optimization problem in which each objective is defined by a state-based reweighting of a single scalar reward function. This generalizes the problem of maximizing the reward of the minimum reward group. We provide oracle-efficient algorithms to solve these multi-objective RL problems even when the number of objectives is exponentially large-for tabular MDPs, as well as for large MDPs when the group functions have additional structure. Finally, we experimentally validate our theoretical results and demonstrate applications on a preferential attachment graph MDP.

翻译：在传统强化学习（RL）中，学习者的目标是解决单一目标优化问题：寻找能最大化期望奖励的策略。然而，在许多现实场景中，同时优化多个目标至关重要。例如，当我们关注公平性问题时，状态可能包含对应多个（交叉）人口统计群体的特征标注，奖励会累积至这些群体，而我们的目标可能是最大化获得最低奖励的群体的奖励。本文研究了一个多目标优化问题，其中每个目标通过基于状态的单一标量奖励函数重加权来定义。这推广了最大化最低奖励群体奖励的问题。我们提出了预言机高效算法来解决这些多目标强化学习问题，即使目标数量呈指数级增长——适用于表格化马尔可夫决策过程（MDP），以及当群体函数具有附加结构时的大型MDP。最后，我们通过实验验证了理论结果，并在偏好依附图MDP上展示了应用案例。

相关内容

GROUP

关注 1

Group一直是研究计算机支持的合作工作、人机交互、计算机支持的协作学习和社会技术研究的主要场所。该会议将社会科学、计算机科学、工程、设计、价值观以及其他与小组工作相关的多个不同主题的工作结合起来，并进行了广泛的概念化。官网链接：https://group.acm.org/conferences/group20/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日