Making decisions is a great challenge in distributed autonomous environments due to enormous state spaces and uncertainty. Many online planning algorithms rely on statistical sampling to avoid searching the whole state space, while still being able to make acceptable decisions. However, planning often has to be performed under strict computational constraints making online planning in multi-agent systems highly limited, which could lead to poor system performance, especially in stochastic domains. In this paper, we propose Emergent Value function Approximation for Distributed Environments (EVADE), an approach to integrate global experience into multi-agent online planning in stochastic domains to consider global effects during local planning. For this purpose, a value function is approximated online based on the emergent system behaviour by using methods of reinforcement learning. We empirically evaluated EVADE with two statistical multi-agent online planning algorithms in a highly complex and stochastic smart factory environment, where multiple agents need to process various items at a shared set of machines. Our experiments show that EVADE can effectively improve the performance of multi-agent online planning while offering efficiency w.r.t. the breadth and depth of the planning process.
翻译:在分布式自主环境中,由于巨大的状态空间和不确定性,决策制定面临重大挑战。许多在线规划算法依赖统计采样来避免搜索整个状态空间,同时仍能做出可接受的决策。然而,规划通常需要在严格的计算约束下进行,这使得多智能体系统中的在线规划受到极大限制,尤其在随机领域中可能导致系统性能不佳。本文提出了分布式环境中的涌现价值函数逼近(EVADE)方法,该方法将全局经验融入随机领域的多智能体在线规划中,以在局部规划时考虑全局影响。为此,我们利用强化学习方法,基于涌现的系统行为在线逼近价值函数。我们在高度复杂且随机的智能工厂环境中,使用两种统计多智能体在线规划算法对EVADE进行了实证评估。在该环境中,多个智能体需要在一组共享机器上处理各种工件。实验表明,EVADE能有效提升多智能体在线规划的性能,同时在规划过程的广度和深度方面保持高效。