Constrained Reinforcement Learning (CRL) is a subset of machine learning that introduces constraints into the traditional reinforcement learning (RL) framework. Unlike conventional RL which aims solely to maximize cumulative rewards, CRL incorporates additional constraints that represent specific mission requirements or limitations that the agent must comply with during the learning process. In this paper, we address a type of CRL problem where an agent aims to learn the optimal policy to maximize reward while ensuring a desired level of temporal logic constraint satisfaction throughout the learning process. We propose a novel framework that relies on switching between pure learning (reward maximization) and constraint satisfaction. This framework estimates the probability of constraint satisfaction based on earlier trials and properly adjusts the probability of switching between learning and constraint satisfaction policies. We theoretically validate the correctness of the proposed algorithm and demonstrate its performance through comprehensive simulations.
翻译:约束强化学习(CRL)是机器学习的一个分支,它在传统强化学习(RL)框架中引入了约束。与仅旨在最大化累积奖励的传统RL不同,CRL纳入了额外的约束,这些约束代表了智能体在学习过程中必须遵守的特定任务要求或限制。在本文中,我们研究了一类CRL问题,其中智能体旨在学习最优策略以最大化奖励,同时确保在整个学习过程中以期望的水平满足时序逻辑约束。我们提出了一种新颖的框架,该框架依赖于在纯学习(奖励最大化)和约束满足之间进行切换。该框架基于先前的试验估计约束满足的概率,并适当调整在学习策略和约束满足策略之间切换的概率。我们从理论上验证了所提算法的正确性,并通过全面的仿真展示了其性能。