ChatGPT is a recent chatbot service released by OpenAI and is receiving increasing attention over the past few months. While evaluations of various aspects of ChatGPT have been done, its robustness, i.e., the performance when facing unexpected inputs, is still unclear to the public. Robustness is of particular concern in responsible AI, especially for safety-critical applications. In this paper, we conduct a thorough evaluation of the robustness of ChatGPT from the adversarial and out-of-distribution (OOD) perspective. To do so, we employ the AdvGLUE and ANLI benchmarks to assess adversarial robustness and the Flipkart review and DDXPlus medical diagnosis datasets for OOD evaluation. We select several popular foundation models as baselines. Results show that ChatGPT does not show consistent advantages on adversarial and OOD classification tasks, while performing well on translation tasks. This suggests that adversarial and OOD robustness remains a significant threat to foundation models. Moreover, ChatGPT shows astounding performance in understanding dialogue-related texts and we find that it tends to provide informal suggestions for medical tasks instead of definitive answers. Finally, we present in-depth discussions of possible research directions.
翻译:ChatGPT是OpenAI最新发布的聊天机器人服务,近几个月来受到越来越多的关注。虽然已对其各方面进行了评估,但其鲁棒性——即面对意外输入时的表现——对公众而言仍不明确。在负责任的人工智能中,鲁棒性尤为关键,尤其是对于安全攸关的应用场景。本文从对抗性与分布外(OOD)视角对ChatGPT的鲁棒性进行了全面评估。为此,我们采用AdvGLUE和ANLI基准测试评估对抗鲁棒性,并利用Flipkart评论和DDXPlus医疗诊断数据集进行分布外评估。选取若干主流基础模型作为基线。结果表明,ChatGPT在对抗性与分布外分类任务中未展现出一致优势,但在翻译任务中表现良好。这表明对抗性与分布外鲁棒性仍是基础模型面临的重大威胁。此外,ChatGPT在理解对话相关文本方面表现出惊人性能,且我们发现其倾向于为医疗任务提供非正式建议而非确定性答案。最后,我们对可能的研究方向进行了深入讨论。