Internet of Agents (IoA) envisions a unified, agent-centric paradigm where heterogeneous large language model (LLM) agents can interconnect and collaborate at scale. Within this paradigm, federated fine-tuning (FFT) serves as a key enabler that allows distributed LLM agents to co-train an intelligent global LLM without centralizing local datasets. However, the FFT-enabled IoA systems remain vulnerable to model poisoning attacks, where adversaries can upload malicious updates to the server to degrade the performance of the aggregated global LLM. This paper proposes a graph representation-based model poisoning (GRMP) attack, which exploits overheard benign updates to construct a feature correlation graph and employs a variational graph autoencoder to capture structural dependencies and generate malicious updates. A novel attack algorithm is developed based on augmented Lagrangian and subgradient descent methods to optimize malicious updates that preserve benign-like statistics while embedding adversarial objectives. Experimental results show that the proposed GRMP attack can substantially decrease accuracy across different LLM models while remaining statistically consistent with benign updates, thereby evading detection by existing defense mechanisms and underscoring a severe threat to the ambitious IoA paradigm.
翻译:智能体互联网(IoA)构想了一种统一的、以智能体为中心的范式,其中异构的大型语言模型(LLM)智能体能够大规模地互联与协作。在此范式中,联邦微调(FFT)作为一个关键使能技术,允许分布式的LLM智能体在不集中本地数据集的情况下,共同训练一个智能的全局LLM。然而,支持FFT的IoA系统仍然容易受到模型投毒攻击,攻击者可以向服务器上传恶意更新,从而降低聚合后全局LLM的性能。本文提出了一种基于图表示的模型投毒(GRMP)攻击,该攻击利用窃听到的良性更新构建特征关联图,并采用变分图自编码器来捕获结构依赖关系并生成恶意更新。基于增广拉格朗日法和次梯度下降法,我们开发了一种新颖的攻击算法,以优化恶意更新,使其在嵌入对抗性目标的同时,保持与良性更新相似的统计特性。实验结果表明,所提出的GRMP攻击能够显著降低不同LLM模型的准确率,同时在统计上与良性更新保持一致,从而规避现有防御机制的检测,这凸显了其对宏伟的IoA范式构成的严重威胁。