Although experience sharing (ES) accelerates multiagent reinforcement learning (MARL) in an advisor-advisee framework, attempts to apply ES to decentralized multiagent systems have so far relied on trusted environments and overlooked the possibility of adversarial manipulation and inference. Nevertheless, in a real-world setting, some Byzantine attackers, disguised as advisors, may provide false advice to the advisee and catastrophically degrade the overall learning performance. Also, an inference attacker, disguised as an advisee, may conduct several queries to infer the advisors' private information and make the entire ES process questionable in terms of privacy leakage. To address and tackle these issues, we propose a novel MARL framework (BRNES) that heuristically selects a dynamic neighbor zone for each advisee at each learning step and adopts a weighted experience aggregation technique to reduce Byzantine attack impact. Furthermore, to keep the agent's private information safe from adversarial inference attacks, we leverage the local differential privacy (LDP)-induced noise during the ES process. Our experiments show that our framework outperforms the state-of-the-art in terms of the steps to goal, obtained reward, and time to goal metrics. Particularly, our evaluation shows that the proposed framework is 8.32x faster than the current non-private frameworks and 1.41x faster than the private frameworks in an adversarial setting.
翻译:尽管在顾问-被顾问框架下,经验共享(ES)能加速多智能体强化学习(MARL),但目前将ES应用于去中心化多智能体系统的尝试仍依赖于可信环境,且忽视了对抗性篡改和推理的可能性。然而,在实际环境中,部分伪装成顾问的拜占庭攻击者可能向被顾问方提供虚假建议,灾难性地降低整体学习性能。此外,伪装成被顾问方的推理攻击者可能通过多次查询推断顾问方的私有信息,使整个ES过程面临隐私泄露风险。针对这些问题,我们提出一种新颖的MARL框架(BRNES),该框架在每个学习步骤中为每个被顾问方启发式地选择动态邻居区域,并采用加权经验聚合技术降低拜占庭攻击的影响。此外,为保护智能体私有信息免受对抗性推理攻击,我们在ES过程中利用局部差分隐私(LDP)诱导的噪声。实验表明,本框架在到达目标步数、获得奖励及到达目标时间等指标上均优于现有最优方法。特别地,评估显示,在对抗性环境下,本框架速度比当前非隐私框架快8.32倍,比隐私框架快1.41倍。