This work focuses on autonomous contingency planning for scientific missions by enabling rapid policy computation from any off-nominal point in the state space in the event of a delay or deviation from the nominal mission plan. Successful contingency planning involves managing risks and rewards, often probabilistically associated with actions, in stochastic scenarios. Markov Decision Processes (MDPs) are used to mathematically model decision-making in such scenarios. However, in the specific case of planetary rover traverse planning, the vast action space and long planning time horizon pose computational challenges. A bi-level MDP framework is proposed to improve computational tractability, while also aligning with existing mission planning practices and enhancing explainability and trustworthiness of AI-driven solutions. We discuss the conversion of a mission planning MDP into a bi-level MDP, and test the framework on RoverGridWorld, a modified GridWorld environment for rover mission planning. We demonstrate the computational tractability and near-optimal policies achievable with the bi-level MDP approach, highlighting the trade-offs between compute time and policy optimality as the problem's complexity grows. This work facilitates more efficient and flexible contingency planning in the context of scientific missions.
翻译:本研究聚焦于科学任务的自主应急规划,通过在状态空间中任意非标称点处实现快速策略计算,以应对标称任务计划出现延迟或偏差的情况。成功的应急规划需要管理随机场景中与动作概率相关的风险与收益。马尔可夫决策过程(MDP)被用于对此类场景中的决策进行数学建模。然而,在行星车遍历规划这一具体案例中,巨大的动作空间和漫长的规划时间跨度带来了计算挑战。本文提出一种双层MDP框架,旨在提升计算可解性的同时,契合现有任务规划实践,并增强AI驱动解决方案的可解释性和可信度。我们讨论了将任务规划MDP转换为双层MDP的方法,并在RoverGridWorld(一种用于火星车任务规划的改进型GridWorld环境)上测试该框架。研究证明了双层MDP方法在计算可解性与实现近最优策略方面的能力,揭示了随着问题复杂度增长,计算时间与策略最优性之间的权衡关系。本工作为科学任务中更高效、更灵活的应急规划提供了支持。