Multi-agent multi-armed bandit (MAMAB) is a classic collaborative learning model and has gained much attention in recent years. However, existing studies do not consider the case where an agent may refuse to share all her information with others, e.g., when some of the data contains personal privacy. In this paper, we propose a novel limited shared information multi-agent multi-armed bandit (LSI-MAMAB) model in which each agent only shares the information that she is willing to share, and propose the Balanced-ETC algorithm to help multiple agents collaborate efficiently with limited shared information. Our analysis shows that Balanced-ETC is asymptotically optimal and its average regret (on each agent) approaches a constant when there are sufficient agents involved. Moreover, to encourage agents to participate in this collaborative learning, an incentive mechanism is proposed to make sure each agent can benefit from the collaboration system. Finally, we present experimental results to validate our theoretical results.
翻译:多智能体多臂赌博机(MAMAB)是一种经典的协作学习模型,近年来受到广泛关注。然而,现有研究并未考虑智能体可能拒绝共享其全部信息的情况,例如当部分数据涉及个人隐私时。本文提出了一种新颖的有限共享信息多智能体多臂赌博机(LSI-MAMAB)模型,其中每个智能体仅共享其愿意分享的信息,并提出Balanced-ETC算法以帮助多个智能体在有限共享信息下高效协作。我们的分析表明,Balanced-ETC具有渐近最优性,且当参与智能体数量充足时,其平均遗憾(针对每个智能体)趋近于常数。此外,为激励智能体参与该协作学习,本文提出了一种激励机制以确保每个智能体都能从协作系统中获益。最后,我们通过实验结果验证了理论分析的正确性。