We present an optimization-based framework for robust permissive synthesis for Interval Markov Decision Processes (IMDPs), motivated by robotic decision-making under transition uncertainty. In many robotic systems, model inaccuracies and sensing noise lead to interval-valued transition probabilities. While robust IMDP synthesis typically yields a single policy and permissive synthesis assumes exact models, we show that robust permissive synthesis under interval uncertainty can be cast as a global mixed-integer linear program (MILP) that directly encodes robust Bellman constraints. The formulation maximizes a quantitative permissiveness metric (the number of enabled state-action pairs), while guaranteeing that every compliant strategy satisfies probabilistic reachability or expected reward specifications under all admissible transition realizations. To address the exponential complexity of vertex-based uncertainty representations, we derive a dualization-based encoding that eliminates explicit vertex enumeration and scales linearly with the number of successors. Experimental evaluation on four representative robotic benchmark domains demonstrates scalability to IMDPs with hundreds of thousands of states. The proposed framework provides a practical and general foundation for uncertainty-aware, flexibility-preserving controller synthesis in robotic systems.
翻译:我们提出了一种基于优化的鲁棒宽松综合框架,用于区间马尔可夫决策过程(IMDPs),其动机是机器人决策在转移不确定性下的应用。在许多机器人系统中,模型不精确性和传感噪声导致转移概率为区间值。虽然鲁棒IMDP综合通常产生单一策略,且宽松综合假设精确模型,但我们证明,在区间不确定性下的鲁棒宽松综合可以表述为一个全局混合整数线性规划(MILP),该规划直接编码了鲁棒贝尔曼约束。该公式最大化一个定量的宽松性度量(启用的状态-动作对数量),同时保证所有兼容策略在所有允许的转移实现下满足概率可达性或期望奖励规约。为了解决基于顶点的不确定性表示带来的指数复杂度,我们推导了一种基于对偶化的编码方法,该方法消除了显式的顶点枚举,并且其后继状态数量呈线性增长。在四个代表性机器人基准领域上的实验评估表明,该方法可扩展到具有数十万个状态的IMDPs。所提出的框架为机器人系统中具有不确定性感知且保持灵活性的控制器综合提供了一个实用且通用的基础。