The flocking motion control is concerned with managing the possible conflicts between local and team objectives of multi-agent systems. The overall control process guides the agents while monitoring the flock-cohesiveness and localization. The underlying mechanisms may degrade due to overlooking the unmodeled uncertainties associated with the flock dynamics and formation. On another side, the efficiencies of the various control designs rely on how quickly they can adapt to different dynamic situations in real-time. An online model-free policy iteration mechanism is developed here to guide a flock of agents to follow an independent command generator over a time-varying graph topology. The strength of connectivity between any two agents or the graph edge weight is decided using a position adjacency dependent function. An online recursive least squares approach is adopted to tune the guidance strategies without knowing the dynamics of the agents or those of the command generator. It is compared with another reinforcement learning approach from the literature which is based on a value iteration technique. The simulation results of the policy iteration mechanism revealed fast learning and convergence behaviors with less computational effort.
翻译:群体运动控制关注多智能体系统中局部与团队目标之间潜在冲突的管理。整体控制过程在监测群体凝聚性与定位的同时引导智能体运动。若忽视与群体动力学及编队相关的未建模不确定性,底层机制可能退化。另一方面,各类控制设计的效率取决于其实时适应不同动态情境的速度。本文开发了一种在线无模型策略迭代机制,用于引导智能体群体在时变图拓扑结构下跟随独立指令生成器。任意两个智能体间的连接强度或图边权重由位置邻接依赖函数决定。采用在线递归最小二乘方法调整引导策略,无需知晓智能体或指令生成器的动力学特性。该方法与文献中基于值迭代技术的另一种强化学习方法进行了比较。策略迭代机制的仿真结果显示出更快的学习与收敛行为,且计算开销更小。