The prosperity of text-to-image (T2I) models has fostered a vibrant share-and-play ecosystem centered on Low-Rank Adaptation (LoRA) plugins, which allow users to customize and share model capabilities with ease. This democratization, however, comes with a hidden but severe security risk. Malicious users could share and distribute seemingly benign LoRA plugins that contain hidden functionalities to poison the model-sharing market, like Civitai or Liblib, severely undermining the user trust that underpins this collaborative ecosystem and threatening the safety of countless downstream applications. Despite these risks, plugin poisoning in the real-world T2I ecosystem remains underexplored. This paper introduces PoisonLoRA, the first systematic study of LoRA plugin supply-chain risks that exploits the trust and characteristics within the T2I ecosystem. We identify two primary attack instances: (1) Concept Hijacking, where a hijacked LoRA could generate images to influence public opinion and spread propaganda, and (2) Task Injection, where a LoRA is injected to produce harmful content (e.g., NSFW images) only activated by a secret key. Critically, the malicious payload persists with virus-like propagation. Such propagations weaponize the very act of creative collaboration (e.g., LoRA merging) to spread its contagion, turning every remix into a new carrier. Extensive experiments validate that PoisonLoRA is both effective and stealthy. Specifically, we achieve approximately 100% attack success rates (ASR) on both Civitai and Liblib on 6 datasets across 4 scenarios, without being detected by the platforms. The poisoned LoRA demonstrates extreme robustness, with nearly 100% ASR even transferred to different base models and remixed more than 5 times.
翻译:文本到图像(T2I)模型的繁荣催生了一个以低秩自适应(LoRA)插件为核心的“分享-使用”生态系统,用户可借此轻松定制和共享模型功能。然而,这种民主化背后潜藏着严重的安全风险。恶意用户可能共享并传播看似良性的LoRA插件,这些插件包含隐藏功能,旨在污染模型共享市场(如Civitai或Liblib),严重破坏支撑该协作生态系统的用户信任,并威胁无数下游应用的安全。尽管存在这些风险,现实T2I生态系统中的插件投毒问题仍未得到充分研究。本文提出PoisonLoRA,这是首个系统研究LoRA插件供应链风险的工作,利用了T2I生态系统中的信任与特性。我们识别出两种主要攻击实例:(1)概念劫持——被劫持的LoRA可生成图像以影响舆论和传播宣传;(2)任务注入——注入后的LoRA仅在秘密密钥激活时产生有害内容(如NSFW图像)。关键是,恶意负载具有病毒式传播能力。此类传播将创意协作行为(如LoRA合并)武器化,以扩散其传染性,使每一次重混成为新的载体。大量实验验证了PoisonLoRA的有效性和隐蔽性。具体而言,我们在Civitai和Liblib平台上,针对4种场景下的6个数据集实现了约100%的攻击成功率(ASR),且未被平台检测。被投毒的LoRA展现出极强的鲁棒性,即使迁移至不同基础模型并经过5次以上重混,其ASR仍接近100%。