Deep reinforcement learning (DRL) has had success across various domains, but applying it to environments with constraints remains challenging due to poor sample efficiency and slow convergence. Recent literature explored incorporating model knowledge to mitigate these problems, particularly through the use of models that assess the feasibility of proposed actions. However, integrating feasibility models efficiently into DRL pipelines in environments with continuous action spaces is non-trivial. We propose a novel DRL training strategy utilizing action mapping that leverages feasibility models to streamline the learning process. By decoupling the learning of feasible actions from policy optimization, action mapping allows DRL agents to focus on selecting the optimal action from a reduced feasible action set. We demonstrate through experiments that action mapping significantly improves training performance in constrained environments with continuous action spaces, especially with imperfect feasibility models.
翻译:深度强化学习(DRL)已在多个领域取得成功,但在具有约束的环境中应用时,由于样本效率低下和收敛速度缓慢,仍面临挑战。近期研究探索通过引入模型知识来缓解这些问题,特别是利用模型评估所提动作的可行性。然而,在连续动作空间环境中,如何将可行性模型高效集成到DRL流程中并非易事。我们提出了一种新颖的DRL训练策略,利用动作映射来整合可行性模型以优化学习过程。通过将可行动作的学习与策略优化解耦,动作映射使DRL智能体能够专注于从缩减的可行动作集中选择最优动作。实验表明,在具有连续动作空间的约束环境中,动作映射能显著提升训练性能,尤其是在可行性模型不完美的情况下。