In the realm of autonomous agents, ensuring safety and reliability in complex and dynamic environments remains a paramount challenge. Safe reinforcement learning addresses these concerns by introducing safety constraints, but still faces challenges in navigating intricate environments such as complex driving situations. To overcome these challenges, we present the safe constraint reward (Safe CoR) framework, a novel method that utilizes two types of expert demonstrations$\unicode{x2013}$reward expert demonstrations focusing on performance optimization and safe expert demonstrations prioritizing safety. By exploiting a constraint reward (CoR), our framework guides the agent to balance performance goals of reward sum with safety constraints. We test the proposed framework in diverse environments, including the safety gym, metadrive, and the real$\unicode{x2013}$world Jackal platform. Our proposed framework enhances the performance of algorithms by $39\%$ and reduces constraint violations by $88\%$ on the real-world Jackal platform, demonstrating the framework's efficacy. Through this innovative approach, we expect significant advancements in real-world performance, leading to transformative effects in the realm of safe and reliable autonomous agents.
翻译:在自主智能体领域,确保其在复杂动态环境中的安全性与可靠性仍然是一项至关重要的挑战。安全强化学习通过引入安全约束来应对这些问题,但在复杂驾驶等精细环境中的导航仍面临挑战。为克服这些挑战,本文提出安全约束奖励框架,这是一种利用两类专家演示的新方法——专注于性能优化的奖励专家演示和优先考虑安全性的安全专家演示。通过利用约束奖励,我们的框架引导智能体在奖励总和的性能目标与安全约束之间取得平衡。我们在多种环境中测试了所提出的框架,包括安全健身房、元驾驶平台以及现实世界的Jackal平台。在现实世界的Jackal平台上,我们提出的框架将算法性能提升了39%,并将约束违反减少了88%,证明了该框架的有效性。通过这一创新方法,我们期望在现实世界性能方面取得显著进展,从而对安全可靠的自主智能体领域产生变革性影响。