With countless promising applications in various domains such as IoT and industry 4.0, task-oriented communication design (TOCD) is getting accelerated attention from the research community. This paper presents a novel approach for designing scalable task-oriented quantization and communications in cooperative multi-agent systems (MAS). The proposed approach utilizes the TOCD framework and the value of information (VoI) concept to enable efficient communication of quantized observations among agents while maximizing the average return performance of the MAS, a parameter that quantifies the MAS's task effectiveness. The computational complexity of learning the VoI, however, grows exponentially with the number of agents. Thus, we propose a three-step framework: i) learning the VoI (using reinforcement learning (RL)) for a two-agent system, ii) designing the quantization policy for an $N$-agent MAS using the learned VoI for a range of bit-budgets and, (iii) learning the agents' control policies using RL while following the designed quantization policies in the earlier step. We observe that one can reduce the computational cost of obtaining the value of information by exploiting insights gained from studying a similar two-agent system - instead of the original $N$-agent system. We then quantize agents' observations such that their more valuable observations are communicated more precisely. Our analytical results show the applicability of the proposed framework under a wide range of problems. Numerical results show striking improvements in reducing the computational complexity of obtaining VoI needed for the TOCD in a MAS problem without compromising the average return performance of the MAS.
翻译:随着物联网和工业4.0等众多领域涌现出大量有前景的应用,任务导向通信设计(TOCD)正加速获得研究界的关注。本文提出了一种新颖方法,用于设计可扩展的任务导向量化与通信方案,应用于协作式多智能体系统(MAS)。所提方法利用TOCD框架和信息价值(VoI)概念,使智能体间能够高效通信量化观测结果,同时最大化MAS的平均回报性能——该参数衡量了MAS的任务有效性。然而,学习VoI的计算复杂度随智能体数量呈指数增长。为此,我们提出了一个三步框架:i)针对双智能体系统,使用强化学习(RL)学习VoI;ii)基于学习的VoI,在多种比特预算下设计N智能体MAS的量化策略;以及iii)遵循上一步设计的量化策略,使用RL学习智能体的控制策略。我们观察到,通过利用从相似双智能体系统(而非原始N智能体系统)中获得的见解,可以降低获取信息价值的计算成本。随后,我们对智能体的观测结果进行量化,使得更具价值的观测结果能够更精确地通信。分析结果表明,该框架适用于广泛的问题。数值结果展示了在MAS问题中,在不牺牲平均回报性能的前提下,显著降低TOCD所需VoI获取计算复杂度的突出改进。