Tabular reinforcement learning methods cannot operate directly on continuous state spaces. One solution for this problem is to partition the state space. A good partitioning enables generalization during learning and more efficient exploitation of prior experiences. Consequently, the learning process becomes faster and produces more reliable policies. However, partitioning introduces approximation, which is particularly harmful in the presence of nonlinear relations between state components. An ideal partition should be as coarse as possible, while capturing the key structure of the state space for the given problem. This work extracts partitions from the environment dynamics by symbolic execution. We show that symbolic partitioning improves state space coverage with respect to environmental behavior and allows reinforcement learning to perform better for sparse rewards. We evaluate symbolic state space partitioning with respect to precision, scalability, learning agent performance and state space coverage for the learnt policies.
翻译:表格型强化学习方法无法直接处理连续状态空间。针对该问题的解决方案之一是对状态空间进行划分。良好的划分能够在学习过程中实现泛化,并更有效地利用先验经验。因此,学习过程变得更快,并产生更可靠的策略。然而,划分会引入近似误差,这在状态分量间存在非线性关系时尤为不利。理想的划分应在尽可能粗粒度的前提下,捕捉给定问题中状态空间的关键结构。本研究通过符号执行从环境动态中提取划分方案。我们证明,符号化划分能够提升状态空间在环境行为方面的覆盖度,并使强化学习在稀疏奖励场景下表现更优。我们从划分精度、可扩展性、智能体学习性能及习得策略的状态空间覆盖度等方面对符号化状态空间划分方法进行了评估。