In this paper, we study the expressivity of scalar, Markovian reward functions in Reinforcement Learning (RL), and identify several limitations to what they can express. Specifically, we look at three classes of RL tasks; multi-objective RL, risk-sensitive RL, and modal RL. For each class, we derive necessary and sufficient conditions that describe when a problem in this class can be expressed using a scalar, Markovian reward. Moreover, we find that scalar, Markovian rewards are unable to express most of the instances in each of these three classes. We thereby contribute to a more complete understanding of what standard reward functions can and cannot express. In addition to this, we also call attention to modal problems as a new class of problems, since they have so far not been given any systematic treatment in the RL literature. We also briefly outline some approaches for solving some of the problems we discuss, by means of bespoke RL algorithms.
翻译:本文研究了强化学习中标量马尔可夫奖励函数的表达力,并揭示了其在表达某些任务时的局限性。具体而言,我们考察了三类强化学习任务:多目标强化学习、风险敏感强化学习和模态强化学习。针对每一类任务,我们推导了该类问题能否用标量马尔可夫奖励表达的充要条件。进一步发现,标量马尔可夫奖励无法表达这三类任务中的绝大多数实例。由此,我们深化了对标准奖励函数表达能力的理解。此外,本文首次关注模态问题这一全新任务类别,此前该问题在强化学习文献中尚未得到系统化处理。我们还简要概述了通过定制化强化学习算法解决上述部分问题的若干方法。