The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptBench, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. These prompts are then employed in diverse tasks, such as sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,032 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets, with 567,084 test samples in total. Our findings demonstrate that contemporary LLMs are vulnerable to adversarial prompts. Furthermore, we present comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users. We make our code, prompts, and methodologies to generate adversarial prompts publicly accessible, thereby enabling and encouraging collaborative exploration in this pivotal field: https://github.com/microsoft/promptbench.
翻译:随着学术界和工业界对大语言模型的依赖日益加深,全面理解其对提示的鲁棒性变得至关重要。针对这一关键需求,我们提出了PromptBench——一个专门用于衡量大语言模型对对抗性提示鲁棒性的基准测试。本研究采用大量针对提示的对抗性文本攻击,覆盖字符、词语、句子及语义四个层面。这些提示随后被应用于情感分析、自然语言推理、阅读理解、机器翻译及数学问题求解等多项任务中。本研究共生成4,032个对抗性提示,通过8项任务、13个数据集进行精细评估,总计包含567,084个测试样本。研究结果表明,当前的大语言模型易受对抗性提示攻击。此外,我们开展了全面分析以揭示提示鲁棒性及其可迁移性背后的内在机理。继而,我们提出了富有洞见的鲁棒性分析结果与面向提示生成的实践建议,对研究人员与日常用户均具有参考价值。我们已将相关代码、提示及对抗性提示生成方法公开于https://github.com/microsoft/promptbench,旨在推动并激励这一关键领域的协作探索。