Large Language Models (LLMs) have led to significant improvements in many tasks across various domains, such as code interpretation, response generation, and ambiguity handling. These LLMs, however, when upgrading, primarily prioritize enhancing user experience while neglecting security, privacy, and safety implications. Consequently, unintended vulnerabilities or biases can be introduced. Previous studies have predominantly focused on specific versions of the models and disregard the potential emergence of new attack vectors targeting the updated versions. Through the lens of adversarial examples within the in-context learning framework, this longitudinal study addresses this gap by conducting a comprehensive assessment of the robustness of successive versions of LLMs, vis-\`a-vis GPT-3.5. We conduct extensive experiments to analyze and understand the impact of the robustness in two distinct learning categories: zero-shot learning and few-shot learning. Our findings indicate that, in comparison to earlier versions of LLMs, the updated versions do not exhibit the anticipated level of robustness against adversarial attacks. In addition, our study emphasizes the increased effectiveness of synergized adversarial queries in most zero-shot learning and few-shot learning cases. We hope that our study can lead to a more refined assessment of the robustness of LLMs over time and provide valuable insights of these models for both developers and users.
翻译:大型语言模型(LLM)在代码解释、响应生成和歧义处理等跨领域多项任务中取得了显著进步。然而,这些LLM在升级时主要优先考虑提升用户体验,而忽视了安全、隐私和可靠性方面的潜在影响。因此,可能引入非预期的漏洞或偏差。以往研究主要关注特定模型版本,忽视了针对更新版本的新攻击向量的潜在涌现。本研究通过情境学习框架中的对抗样本视角,以GPT-3.5为对象,首次对连续版本LLM的鲁棒性进行了纵向综合评估。我们开展大量实验,分析和理解零样本学习与少样本学习两类不同学习范式中的鲁棒性影响。研究结果表明,与早期版本LLM相比,更新版本并未展现出预期的对抗攻击鲁棒性。此外,我们的研究强调了在大多数零样本学习和少样本学习场景中,协同对抗查询具有更高的有效性。我们希望本研究能推动对LLM随时间推移的鲁棒性进行更精细的评估,并为开发者和用户提供关于这些模型的宝贵见解。