Q-learning is widely recognized as an effective approach for synthesizing controllers to achieve specific goals. However, handling challenges posed by continuous state-action spaces remains an ongoing research focus. This paper presents a systematic analysis that highlights a major drawback in space discretization methods. To address this challenge, the paper proposes a symbolic model that represents behavioral relations, such as alternating simulation from abstraction to the controlled system. This relation allows for seamless application of the synthesized controller based on abstraction to the original system. Introducing a novel Q-learning technique for symbolic models, the algorithm yields two Q-tables encoding optimal policies. Theoretical analysis demonstrates that these Q-tables serve as both upper and lower bounds on the Q-values of the original system with continuous spaces. Additionally, the paper explores the correlation between the parameters of the space abstraction and the loss in Q-values. The resulting algorithm facilitates achieving optimality within an arbitrary accuracy, providing control over the trade-off between accuracy and computational complexity. The obtained results provide valuable insights for selecting appropriate learning parameters and refining the controller. The engineering relevance of the proposed Q-learning based symbolic model is illustrated through two case studies.
翻译:Q学习被广泛认为是实现特定目标的控制器综合的有效方法。然而,处理连续状态-动作空间带来的挑战仍是当前的研究重点。本文通过系统分析揭示了空间离散化方法的一个主要缺陷。为解决这一问题,本文提出了一种表征行为关系的符号模型,例如从抽象系统到被控系统的交替模拟关系。该关系使得基于抽象系统综合的控制器能够无缝应用于原始系统。通过引入一种针对符号模型的新型Q学习技术,该算法生成两个编码最优策略的Q表。理论分析表明,这些Q表同时构成连续空间原始系统Q值的上界和下界。此外,本文探讨了空间抽象参数与Q值损失之间的关联性。所得算法能够在任意精度内实现最优性,从而实现对精度与计算复杂度之间权衡的调控。研究结果为选择合适的学习参数及优化控制器提供了重要参考。通过两个案例研究,阐明了所提出的基于Q学习的符号模型在工程实践中的相关性。