Synthesising autonomous agents that can navigate uncertain environments while adhering to complex temporal constraints remains a fundamental challenge. While Linear Temporal Logic (LTL) provides a rigorous language for specifying such tasks, the inherent undecidability of qualitatively verifying LTL satisfaction in partially observable Markov decision processes renders quantitative synthesis difficult, especially when designing reliable reward signals for approximate solvers. In this paper, we bridge this gap with a novel, sound reward-shaping mechanism that dynamically generates belief-dependent rewards grounded in certified LTL satisfaction. By integrating this mechanism into an enhanced Monte Carlo Planning framework, we empower agents to navigate the `fog' of partial observability with a search process focused on maximising verifiable success. Our experiments demonstrate that this approach not only thrives in scenarios where existing solvers fail but also maintains effectiveness and scalability across diverse benchmark domains.
翻译:在不确定环境中导航并遵守复杂时间约束的自主智能体合成仍是一项根本性挑战。尽管线性时序逻辑(Linear Temporal Logic, LTL)为指定此类任务提供了严谨语言,但在部分可观测马尔可夫决策过程中对LTL满足性进行定性验证的固有不可判定性,使得定量综合(尤其是为近似求解器设计可靠奖励信号)变得困难。本文通过提出一种新颖且可靠的奖励塑造机制来填补这一空白,该机制能动态生成基于可认证LTL满足性的信念依赖奖励。通过将该机制集成到增强型蒙特卡洛规划框架中,我们使智能体能够在部分可观测的"迷雾"中导航,同时其搜索过程聚焦于最大化可验证的成功概率。实验表明,该方法不仅在现有求解器失效的场景中表现出色,而且在多样化基准领域间保持有效性和可扩展性。