ChatGPT is a recent chatbot service released by OpenAI and is receiving increasing attention over the past few months. While evaluations of various aspects of ChatGPT have been done, its robustness, i.e., the performance to unexpected inputs, is still unclear to the public. Robustness is of particular concern in responsible AI, especially for safety-critical applications. In this paper, we conduct a thorough evaluation of the robustness of ChatGPT from the adversarial and out-of-distribution (OOD) perspective. To do so, we employ the AdvGLUE and ANLI benchmarks to assess adversarial robustness and the Flipkart review and DDXPlus medical diagnosis datasets for OOD evaluation. We select several popular foundation models as baselines. Results show that ChatGPT shows consistent advantages on most adversarial and OOD classification and translation tasks. However, the absolute performance is far from perfection, which suggests that adversarial and OOD robustness remains a significant threat to foundation models. Moreover, ChatGPT shows astounding performance in understanding dialogue-related texts and we find that it tends to provide informal suggestions for medical tasks instead of definitive answers. Finally, we present in-depth discussions of possible research directions.
翻译:ChatGPT是OpenAI近期推出的对话式聊天机器人服务,近几个月来备受关注。尽管已有研究从多个维度对ChatGPT进行评估,但其鲁棒性——即面对非预期输入时的表现——对公众而言仍不清晰。在负责任的人工智能领域,特别是安全关键型应用中,鲁棒性是核心关注点。本文从对抗性与分布外(OOD)视角对ChatGPT的鲁棒性进行了全面评估。为此,我们采用AdvGLUE和ANLI基准测试评估对抗鲁棒性,并使用Flipkart评论与DDXPlus医学诊断数据集进行OOD评估。我们选取多个主流基础模型作为基线。结果表明,ChatGPT在绝大多数对抗性与OOD分类及翻译任务中展现出持续优势,但其绝对性能远未达到完美水平,表明对抗性与OOD鲁棒性仍是基础模型的重大威胁。此外,ChatGPT在对话文本理解方面表现惊人,但我们发现其在医学任务中倾向于提供非正式建议而非确定性答案。最后,我们对未来可能的研究方向进行了深入探讨。