This paper presents an approach for data-driven policy refinement in reinforcement learning, specifically designed for safety-critical applications. Our methodology leverages the strengths of data-driven optimization and reinforcement learning to enhance policy safety and optimality through iterative refinement. Our principal contribution lies in the mathematical formulation of this data-driven policy refinement concept. This framework systematically improves reinforcement learning policies by learning from counterexamples surfaced during data-driven verification. Furthermore, we present a series of theorems elucidating key theoretical properties of our approach, including convergence, robustness bounds, generalization error, and resilience to model mismatch. These results not only validate the effectiveness of our methodology but also contribute to a deeper understanding of its behavior in different environments and scenarios.
翻译:本文提出了一种用于强化学习中数据驱动策略精炼的方法,特别针对安全关键应用设计。我们的方法论融合了数据驱动优化与强化学习的优势,通过迭代精炼提升策略的安全性与最优性。主要贡献在于对数据驱动策略精炼概念的数学形式化表述。该框架通过从数据驱动验证过程中生成的异常案例中学习,系统地改进强化学习策略。此外,我们提出了一系列定理,阐明了该方法的关键理论性质,包括收敛性、鲁棒性界、泛化误差以及对模型失配的韧性。这些结果不仅验证了我们方法的有效性,也有助于深入理解其在不同环境与场景下的行为表现。