Linear temporal logic (LTL) and omega-regular objectives -- a superset of LTL -- have seen recent use as a way to express non-Markovian objectives in reinforcement learning. We introduce a model-based probably approximately correct (PAC) learning algorithm for omega-regular objectives in Markov decision processes. Unlike prior approaches, our algorithm learns from sampled trajectories of the system and does not require prior knowledge of the system's topology.
翻译:线性时序逻辑(LTL)及作为其超集的Omega-正则目标,近期被用于表达强化学习中的非马尔可夫目标。我们提出一种基于模型的概率近似正确(PAC)学习算法,用于马尔可夫决策过程中的Omega-正则目标。与现有方法不同,我们的算法通过学习系统采样轨迹来运行,无需预先了解系统的拓扑结构。