This paper studies temporal planning in probabilistic environments, modeled as labeled Markov decision processes (MDPs), with user preferences over multiple temporal goals. Existing works reflect such preferences as a prioritized list of goals. This paper introduces a new specification language, termed prioritized qualitative choice linear temporal logic on finite traces, which augments linear temporal logic on finite traces with prioritized conjunction and ordered disjunction from prioritized qualitative choice logic. This language allows for succinctly specifying temporal objectives with corresponding preferences accomplishing each temporal task. The finite traces that describe the system's behaviors are ranked based on their dissatisfaction scores with respect to the formula. We propose a systematic translation from the new language to a weighted deterministic finite automaton. Utilizing this computational model, we formulate and solve a problem of computing an optimal policy that minimizes the expected score of dissatisfaction given user preferences. We demonstrate the efficacy and applicability of the logic and the algorithm on several case studies with detailed analyses for each.
翻译:本文研究概率环境下的时序规划问题,该环境被建模为带标签的马尔可夫决策过程(MDP),并考虑了用户对多个时序目标的偏好。现有工作将此类偏好表示为目标的优先列表。本文引入了一种新的规范语言,称为有限轨迹上带优先偏好的定性选择线性时序逻辑,该逻辑通过引入优先定性选择逻辑中的优先合取和有序析取算子,扩展了有限轨迹上的线性时序逻辑。该语言能简洁地指定具有对应偏好的时序目标,从而完成每个时序任务。描述系统行为的有限轨迹根据其对公式的未满足得分进行排序。我们提出了一种从新语言到加权确定性有限自动机的系统转换方法。利用该计算模型,我们建立并求解了一个问题,即计算一个最优策略,以最小化给定用户偏好下的期望未满足得分。通过多个案例研究及详细分析,我们展示了该逻辑与算法的有效性和适用性。