Reinforcement learning (RL) aims to learn and evaluate a sequential decision rule, often referred to as a "policy", that maximizes the population-level benefit in an environment across possibly infinitely many time steps. However, the sequential decisions made by an RL algorithm, while optimized to maximize overall population benefits, may disadvantage certain individuals who are in minority or socioeconomically disadvantaged groups. To address this problem, we introduce PyCFRL, a Python library for ensuring counterfactual fairness in offline RL. PyCFRL implements a novel data preprocessing algorithm for learning counterfactually fair RL policies from offline datasets and provides tools to evaluate the values and counterfactual unfairness levels of RL policies. We describe the high-level functionalities of PyCFRL and demonstrate one of its major use cases through a data example. The library is publicly available on PyPI and Github (https://github.com/JianhanZhang/PyCFRL), and detailed tutorials can be found in the PyCFRL documentation (https://pycfrl-documentation.netlify.app).
翻译:强化学习(RL)旨在学习并评估一种序列决策规则(通常称为“策略”),以在可能无限多的时间步长中最大化环境中的群体效益。然而,强化学习算法做出的序列决策虽然在优化以最大化整体群体效益,却可能使某些处于少数或社会经济弱势群体的个体处于不利地位。为解决这一问题,我们引入了PyCFRL,一个用于确保离线强化学习中反事实公平性的Python库。PyCFRL实现了一种新颖的数据预处理算法,用于从离线数据集中学习反事实公平的强化学习策略,并提供工具来评估强化学习策略的价值及反事实不公平程度。我们描述了PyCFRL的高级功能,并通过一个数据示例演示了其主要用例之一。该库已在PyPI和Github(https://github.com/JianhanZhang/PyCFRL)上公开提供,详细教程可在PyCFRL文档(https://pycfrl-documentation.netlify.app)中找到。