While significant advancements have been made in the field of fair machine learning, the majority of studies focus on scenarios where the decision model operates on a static population. In this paper, we study fairness in dynamic systems where sequential decisions are made. Each decision may shift the underlying distribution of features or user behavior. We model the dynamic system through a Markov Decision Process (MDP). By acknowledging that traditional fairness notions and long-term fairness are distinct requirements that may not necessarily align with one another, we propose an algorithmic framework to integrate various fairness considerations with reinforcement learning using both pre-processing and in-processing approaches. Three case studies show that our method can strike a balance between traditional fairness notions, long-term fairness, and utility.
翻译:尽管公平机器学习领域已取得显著进展,但大多数研究聚焦于决策模型作用于静态群体的场景。本文研究了动态系统中序贯决策的公平性问题——每个决策可能改变用户特征或行为的基础分布。我们通过马尔可夫决策过程(MDP)对动态系统建模。鉴于传统公平概念与长期公平性作为不同要求可能并不必然一致,本文提出了一种算法框架,通过预处理和过程中处理方法将多种公平性考量与强化学习相结合。三个案例研究表明,我们的方法能够在传统公平概念、长期公平性与效用之间取得平衡。