This paper studies the control synthesis of motion planning subject to uncertainties. The uncertainties are considered in robot motions and environment properties, giving rise to the probabilistic labeled Markov decision process (PL-MDP). A Model-Free Reinforcement The learning (RL) method is developed to generate a finite-memory control policy to satisfy high-level tasks expressed in linear temporal logic (LTL) formulas. Due to uncertainties and potentially conflicting tasks, this work focuses on infeasible LTL specifications, where a relaxed LTL constraint is developed to allow the agent to revise its motion plan and take violations of original tasks into account for partial satisfaction. And a novel automaton is developed to improve the density of accepting rewards and enable deterministic policies. We proposed an RL framework with rigorous analysis that is guaranteed to achieve multiple objectives in decreasing order: 1) satisfying the acceptance condition of relaxed product MDP and 2) reducing the violation cost over long-term behaviors. We provide simulation and experimental results to validate the performance.
翻译:本文研究存在不确定性下的运动规划控制综合问题。考虑到机器人运动和环境属性的不确定性,建立了概率标记马尔可夫决策过程(PL-MDP)。提出了一种无模型强化学习(RL)方法,用于生成有限记忆控制策略,以满足以线性时态逻辑(LTL)公式表述的高层级任务。针对不确定性及潜在冲突任务,本研究聚焦于不可满足的LTL规范,提出了一种松弛LTL约束,使智能体能够修改运动计划,并将原始任务的违反情况纳入部分满足的考量。同时,开发了一种新型自动机,以提高接收奖励的密度并支持确定性策略。我们提出的强化学习框架经过严格分析,可保证按递减优先级实现多重目标:1)满足松弛乘积MDP的接受条件;2)降低长期行为中的违反代价。通过仿真与实验结果验证了该方法的有效性。