Optimal decision-making presents a significant challenge for autonomous systems operating in uncertain, stochastic and time-varying environments. Environmental variability over time can significantly impact the system's optimal decision making strategy for mission completion. To model such environments, our work combines the previous notion of Time-Varying Markov Decision Processes (TVMDP) with partial observability and introduces Time-Varying Partially Observable Markov Decision Processes (TV-POMDP). We propose a two-pronged approach to accurately estimate and plan within the TV-POMDP: 1) Memory Prioritized State Estimation (MPSE), which leverages weighted memory to provide more accurate time-varying transition estimates; and 2) an MPSE-integrated planning strategy that optimizes long-term rewards while accounting for temporal constraint. We validate the proposed framework and algorithms using simulations and hardware, with robots exploring a partially observable, time-varying environments. Our results demonstrate superior performance over standard methods, highlighting the framework's effectiveness in stochastic, uncertain, time-varying domains.
翻译:最优决策制定对在不确定、随机且时变环境中运行的自主系统构成重大挑战。环境随时间的变化会显著影响系统完成任务的决策策略。为建模此类环境,本研究将先前提出的时变马尔可夫决策过程(TVMDP)与部分可观测性相结合,引入时变部分可观测马尔可夫决策过程(TV-POMDP)。我们提出一种双管齐下的方法,以实现对TV-POMDP的精确估计与规划:1)记忆优先状态估计(MPSE),利用加权记忆提供更准确的时变转移估计;2)集成MPSE的规划策略,在优化长期奖励的同时考虑时间约束。通过仿真与硬件实验,我们验证了所提框架与算法在机器人在部分可观测、时变环境中的探索任务中的有效性。结果表明,该方法在随机、不确定、时变领域的表现优于标准方法,凸显了框架的优越性能。