The emergence of ChatGPT has generated much speculation in the press about its potential to disrupt social and economic systems. Its astonishing language ability has aroused strong curiosity among scholars about its performance in different domains. There have been many studies evaluating the ability of ChatGPT and GPT-4 in different tasks and disciplines. However, a comprehensive review summarizing the collective assessment findings is lacking. The objective of this survey is to thoroughly analyze prior assessments of ChatGPT and GPT-4, focusing on its language and reasoning abilities, scientific knowledge, and ethical considerations. Furthermore, an examination of the existing evaluation methods is conducted, offering several recommendations for future research in evaluating large language models.
翻译:ChatGPT的出现引发了媒体对其可能颠覆社会与经济体系的广泛猜测。其惊人的语言能力激发了学者们对其在不同领域表现力的强烈好奇。目前已有大量研究对ChatGPT与GPT-4在各类任务和学科中的能力进行了评估。然而,尚缺乏一份系统总结这些评估结果的综合性综述。本综述旨在深入分析先前对ChatGPT与GPT-4的评估,重点关注其语言与推理能力、科学知识水平以及伦理考量。此外,本文还对现有评估方法进行了审视,并为未来大型语言模型评估研究提供了若干建议。