When allowing concurrent actions in Markov Decision Processes, whose state and action spaces grow exponentially in the number of objects, computing a policy becomes highly inefficient, as it requires enumerating the joint of the two spaces. For the case of indistinguishable objects, we present a first-order representation to tackle the exponential blow-up in the action and state spaces. We propose Foreplan, an efficient relational forward planner, which uses the first-order representation allowing to compute policies in space and time polynomially in the number of objects. Thus, Foreplan significantly increases the number of planning problems solvable in an exact manner in reasonable time, which we underscore with a theoretical analysis. To speed up computations even further, we also introduce an approximate version of Foreplan, including guarantees on the error. Further, we provide an empirical evaluation of both Foreplan versions, demonstrating a speedup of several orders of magnitude. For the approximate version of Foreplan, we also empirically show that the induced error is often negligible.
翻译:在允许马尔可夫决策过程中存在并发动作时,由于状态空间和动作空间随对象数量呈指数级增长,策略计算变得极为低效,因为这需要枚举两个空间的笛卡尔积。针对不可区分对象的情况,我们提出一种一阶表示法以应对动作和状态空间的指数爆炸问题。我们提出Foreplan——一种高效的关系型前向规划器,该规划器采用一阶表示法,使得策略计算所需的时间和空间复杂度均与对象数量呈多项式关系。因此,Foreplan显著增加了可在合理时间内精确求解的规划问题数量,我们通过理论分析对此进行了论证。为进⼀步加速计算,我们还提出了Foreplan的近似版本,并提供了误差保证。此外,我们对两种Foreplan版本进行了实证评估,证明了数个数量级的加速效果。对于Foreplan的近似版本,我们通过实验进一步表明其引入的误差通常可忽略不计。