The increasing reliance on Large Language Models (LLMs) across academia and industry necessitates a comprehensive understanding of their robustness to prompts. In response to this vital need, we introduce PromptRobust, a robustness benchmark designed to measure LLMs' resilience to adversarial prompts. This study uses a plethora of adversarial textual attacks targeting prompts across multiple levels: character, word, sentence, and semantic. The adversarial prompts, crafted to mimic plausible user errors like typos or synonyms, aim to evaluate how slight deviations can affect LLM outcomes while maintaining semantic integrity. These prompts are then employed in diverse tasks including sentiment analysis, natural language inference, reading comprehension, machine translation, and math problem-solving. Our study generates 4,788 adversarial prompts, meticulously evaluated over 8 tasks and 13 datasets. Our findings demonstrate that contemporary LLMs are not robust to adversarial prompts. Furthermore, we present a comprehensive analysis to understand the mystery behind prompt robustness and its transferability. We then offer insightful robustness analysis and pragmatic recommendations for prompt composition, beneficial to both researchers and everyday users.
翻译:学术界与工业界对大型语言模型(LLMs)日益增长的依赖,亟需对其提示鲁棒性进行全面理解。为应对这一关键需求,我们提出了PromptRobust——一个旨在衡量LLMs对抗性提示鲁棒性的基准测试框架。本研究采用多层次的对抗性文本攻击策略(字符级、词汇级、句子级及语义级)对提示进行系统性测试。这些对抗性提示通过模拟典型用户错误(如拼写错误或同义词替换)构建,旨在评估语义完整性保持前提下轻微偏差对LLM输出的影响。我们将这些提示应用于情感分析、自然语言推理、阅读理解、机器翻译和数学解题等多样化任务中。本研究共生成4,788个对抗性提示,在8类任务和13个数据集上进行了精细评估。实验结果表明,当前主流LLMs对对抗性提示缺乏鲁棒性。此外,我们通过系统性分析揭示了提示鲁棒性背后的机制及其可迁移特性。最后,我们提出了具有洞察力的鲁棒性分析框架,并为提示构建提供了实用建议,这些成果对研究者和日常用户均具有重要参考价值。