With the recent advent of Large Language Models (LLMs), such as ChatGPT from OpenAI, BARD from Google, Llama2 from Meta, and Claude from Anthropic AI, gain widespread use, ensuring their security and robustness is critical. The widespread use of these language models heavily relies on their reliability and proper usage of this fascinating technology. It is crucial to thoroughly test these models to not only ensure its quality but also possible misuses of such models by potential adversaries for illegal activities such as hacking. This paper presents a novel study focusing on exploitation of such large language models against deceptive interactions. More specifically, the paper leverages widespread and borrows well-known techniques in deception theory to investigate whether these models are susceptible to deceitful interactions. This research aims not only to highlight these risks but also to pave the way for robust countermeasures that enhance the security and integrity of language models in the face of sophisticated social engineering tactics. Through systematic experiments and analysis, we assess their performance in these critical security domains. Our results demonstrate a significant finding in that these large language models are susceptible to deception and social engineering attacks.
翻译:随着近期OpenAI的ChatGPT、Google的BARD、Meta的Llama2以及Anthropic AI的Claude等大语言模型(LLM)的广泛应用,确保其安全性和鲁棒性至关重要。这些语言模型的广泛使用在很大程度上依赖于其可靠性以及这一前沿技术的合理运用。必须对这些模型进行全面测试,不仅要保证其质量,还要防范潜在对手(如黑客)可能利用此类模型进行非法活动。本文提出了一项创新性研究,聚焦于针对大语言模型的欺骗性交互攻击。具体而言,本文借鉴并运用了欺骗理论中广为人知的成熟技术,旨在探究这些模型是否易受欺骗性交互的影响。本研究不仅旨在揭示这些风险,更致力于为制定稳健的防御措施铺平道路,以增强语言模型面对复杂社会工程学攻击时的安全性与完整性。通过系统性实验与分析,我们评估了这些模型在关键安全领域的表现。研究结果表明了一个重要发现:大语言模型易受欺骗和社会工程学攻击。