Comprehensive Evaluation of ChatGPT Reliability Through Multilingual Inquiries

ChatGPT is currently the most popular large language model (LLM), with over 100 million users, making a significant impact on people's lives. However, due to the presence of jailbreak vulnerabilities, ChatGPT might have negative effects on people's lives, potentially even facilitating criminal activities. Testing whether ChatGPT can cause jailbreak is crucial because it can enhance ChatGPT's security, reliability, and social responsibility. Inspired by previous research revealing the varied performance of LLMs in different language translations, we suspected that wrapping prompts in multiple languages might lead to ChatGPT jailbreak. To investigate this, we designed a study with a fuzzing testing approach to analyzing ChatGPT's cross-linguistic proficiency. Our study includes three strategies by automatically posing different formats of malicious questions to ChatGPT: (1) each malicious question involving only one language, (2) multilingual malicious questions, (3) specifying that ChatGPT responds in a language different from the prompts. In addition, we also combine our strategies by utilizing prompt injection templates to wrap the three aforementioned types of questions. We examined a total of 7,892 Q&A data points, discovering that multilingual wrapping can indeed lead to ChatGPT's jailbreak, with different wrapping methods having varying effects on jailbreak probability. Prompt injection can amplify the probability of jailbreak caused by multilingual wrapping. This work provides insights for OpenAI developers to enhance ChatGPT's support for language diversity and inclusion.

翻译：ChatGPT是当前最流行的大型语言模型（LLM），拥有超过1亿用户，对人们的生活产生了深远影响。然而，由于越狱漏洞的存在，ChatGPT可能对人们的生活产生负面影响，甚至可能助长犯罪行为。测试ChatGPT是否会导致越狱至关重要，因为这可以增强其安全性、可靠性和社会责任感。受先前研究揭示LLM在不同语言翻译中表现各异的启发，我们怀疑使用多语言包装提示可能导致ChatGPT越狱。为探究此问题，我们设计了一项研究，采用模糊测试方法分析ChatGPT的跨语言能力。我们的研究包含三种策略：通过自动向ChatGPT提出不同格式的恶意问题，（1）每个恶意问题仅涉及一种语言；（2）多语言恶意问题；（3）指定ChatGPT以与提示不同的语言作答。此外，我们还结合这些策略，利用提示注入模板包装上述三类问题。我们共检查了7,892个问答数据点，发现多语言包装确实可能导致ChatGPT越狱，且不同包装方法对越狱概率的影响各异。提示注入可以放大由多语言包装引起的越狱概率。这项工作为OpenAI开发者提供了加强ChatGPT对语言多样性和包容性支持的见解。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日