Prompt injection attacks pose serious security risks across a wide range of real-world applications. While receiving increasing attention, the community faces a critical gap: the lack of a unified platform for prompt injection evaluation. This makes it challenging to reliably compare defenses, understand their true robustness under diverse attacks, or assess how well they generalize across tasks and benchmarks. For instance, many defenses initially reported as effective were later found to exhibit limited robustness on diverse datasets and attacks. To bridge this gap, we introduce PIArena, a unified and extensible platform for prompt injection evaluation that enables users to easily integrate state-of-the-art attacks and defenses and evaluate them across a variety of existing and new benchmarks. We also design a dynamic strategy-based attack that adaptively optimizes injected prompts based on defense feedback. Through comprehensive evaluation using PIArena, we uncover critical limitations of state-of-the-art defenses: limited generalizability across tasks, vulnerability to adaptive attacks, and fundamental challenges when an injected task aligns with the target task. The code and datasets are available at https://github.com/sleeepeer/PIArena.
翻译:提示注入攻击对多种实际应用构成了严重的安全风险。尽管这一问题日益受到关注,但该领域仍面临一个关键缺口:缺乏统一的提示注入评估平台。这使得我们难以可靠地比较防御方法、了解它们在多样化攻击下的真实鲁棒性,或评估其在不同任务和基准上的泛化能力。例如,许多最初报告有效的防御方法后来被发现对多样化数据集和攻击的鲁棒性有限。为弥合这一缺口,我们提出了PIArena,一个统一且可扩展的提示注入评估平台,使用户能够便捷地集成最先进的攻击与防御方法,并在多种现有及新基准上对其进行评估。我们还设计了一种基于动态策略的攻击方法,能够根据防御反馈自适应优化注入提示。通过使用PIArena进行综合评估,我们揭示了当前最优防御方法的若干关键局限:跨任务的泛化能力有限、对自适应攻击的脆弱性,以及当注入任务与目标任务一致时面临的根本性挑战。代码与数据集可见于https://github.com/sleeepeer/PIArena。