We study a class of constrained reinforcement learning (RL) problems in which multiple constraint specifications are not identified before training. It is challenging to identify appropriate constraint specifications due to the undefined trade-off between the reward maximization objective and the constraint satisfaction, which is ubiquitous in constrained decision-making. To tackle this issue, we propose a new constrained RL approach that searches for policy and constraint specifications together. This method features the adaptation of relaxing the constraint according to a relaxation cost introduced in the learning objective. Since this feature mimics how ecological systems adapt to disruptions by altering operation, our approach is termed as resilient constrained RL. Specifically, we provide a set of sufficient conditions that balance the constraint satisfaction and the reward maximization in notion of resilient equilibrium, propose a tractable formulation of resilient constrained policy optimization that takes this equilibrium as an optimal solution, and advocate two resilient constrained policy search algorithms with non-asymptotic convergence guarantees on the optimality gap and constraint satisfaction. Furthermore, we demonstrate the merits and the effectiveness of our approach in computational experiments.
翻译:我们研究了一类约束强化学习问题,其中多个约束规格在训练前未被确定。由于奖励最大化目标与约束满足之间存在未定义的权衡(这在约束决策问题中普遍存在),确定合适的约束规格具有挑战性。为解决此问题,我们提出了一种新的约束强化学习方法,该方法同时搜索策略与约束规格。该方法的特色在于:根据学习目标中引入的松弛代价自适应地放松约束。由于该特征模仿了生态系统通过改变运作方式适应干扰的机制,我们将此方法称为弹性约束强化学习。具体而言,我们提出了一组在弹性均衡概念下平衡约束满足与奖励最大化的充分条件,构建了以该均衡为最优解的可解弹性约束策略优化框架,并提出了两种具有非渐近最优性间隙与约束满足收敛保证的弹性约束策略搜索算法。此外,我们通过计算实验验证了该方法的优势与有效性。