Multi-Armed Bandit (MAB) systems are witnessing an upswing in applications within multi-agent distributed environments, leading to the advancement of collaborative MAB algorithms. In such settings, communication between agents executing actions and the primary learner making decisions can hinder the learning process. A prevalent challenge in distributed learning is action erasure, often induced by communication delays and/or channel noise. This results in agents possibly not receiving the intended action from the learner, subsequently leading to misguided feedback. In this paper, we introduce novel algorithms that enable learners to interact concurrently with distributed agents across heterogeneous action erasure channels with different action erasure probabilities. We illustrate that, in contrast to existing bandit algorithms, which experience linear regret, our algorithms assure sub-linear regret guarantees. Our proposed solutions are founded on a meticulously crafted repetition protocol and scheduling of learning across heterogeneous channels. To our knowledge, these are the first algorithms capable of effectively learning through heterogeneous action erasure channels. We substantiate the superior performance of our algorithm through numerical experiments, emphasizing their practical significance in addressing issues related to communication constraints and delays in multi-agent environments.
翻译:多臂赌博机(MAB)系统在多智能体分布式环境中的应用正呈现上升趋势,推动了协作式多臂赌博机算法的发展。在此类场景中,执行动作的智能体与做出决策的主学习器之间的通信会阻碍学习过程。分布式学习中的一个常见挑战是动作擦除,这通常由通信延迟和/或信道噪声引发,导致智能体可能无法接收来自学习器的预期动作,进而产生误导性反馈。本文提出了一系列新颖算法,使学习器能够通过与分布式智能体并发交互,在具有不同动作擦除概率的异构动作擦除信道上进行学习。我们证明,与遭受线性遗憾的现有赌博机算法相比,所提算法能够保证亚线性遗憾界。所提出的解决方案基于精心设计的重复协议以及跨异构信道的学习调度。据我们所知,这是首批能够通过异构动作擦除信道实现有效学习的算法。通过数值实验验证了所提算法的优越性能,强调了其在解决多智能体环境中通信约束与延迟问题上的实际重要意义。