In many robotic tasks, agents must traverse a sequence of spatial regions to complete a mission. Such problems are inherently mixed discrete-continuous: a high-level action sequence and a physically feasible continuous trajectory. The resulting trajectory and action sequence must also satisfy problem constraints such as deadlines, time windows, and velocity or acceleration limits. While hybrid temporal planners attempt to address this challenge, they typically model motion using linear (first-order) dynamics, which cannot guarantee that the resulting plan respects the robot's true physical constraints. Consequently, even when the high-level action sequence is fixed, producing a dynamically feasible trajectory becomes a bi-level optimization problem. We address this problem via reinforcement learning in continuous space. We define a Markov Decision Process that explicitly incorporates analytical second-order constraints and use it to refine first-order plans generated by a hybrid planner. Our results show that this approach can reliably recover physical feasibility and effectively bridge the gap between a planner's initial first-order trajectory and the dynamics required for real execution.
翻译:在许多机器人任务中,智能体必须遍历一系列空间区域以完成使命。这类问题本质上是混合的离散-连续问题:包含高层动作序列与物理可行的连续轨迹。生成的轨迹与动作序列还需满足问题约束,如截止时间、时间窗口、速度或加速度限制。尽管混合时间规划器试图解决这一挑战,但它们通常采用线性(一阶)动力学对运动建模,无法保证所生成的规划符合机器人的真实物理约束。因此,即使确定了高层动作序列,生成动态可行的轨迹仍成为一个双层优化问题。我们通过连续空间中的强化学习来解决该问题。我们定义了一个显式包含解析二阶约束的马尔可夫决策过程,并用于优化混合规划器生成的初始一阶规划。实验结果表明,该方法能够可靠地恢复物理可行性,有效弥合规划器初始一阶轨迹与实际执行所需动力学之间的差距。