We consider the challenge of AI value alignment with multiple individuals that have different reward functions and optimal policies in an underlying Markov decision process. We formalize this problem as one of policy aggregation, where the goal is to identify a desirable collective policy. We argue that an approach informed by social choice theory is especially suitable. Our key insight is that social choice methods can be reinterpreted by identifying ordinal preferences with volumes of subsets of the state-action occupancy polytope. Building on this insight, we demonstrate that a variety of methods--including approval voting, Borda count, the proportional veto core, and quantile fairness--can be practically applied to policy aggregation.
翻译:我们探讨了在具有不同奖励函数和最优策略的多个个体之间实现人工智能价值对齐的挑战,该问题基于一个底层的马尔可夫决策过程。我们将此问题形式化为策略聚合问题,其目标是找到一个理想的集体策略。我们认为,借鉴社会选择理论的方法尤为合适。我们的核心洞见在于,通过将序数偏好与状态-动作占用多面体的子集体积相关联,可以重新诠释社会选择方法。基于这一洞见,我们证明了包括赞同投票、波达计数、比例否决核心以及分位数公平性在内的多种方法,均可实际应用于策略聚合。