Large language models have been demonstrated to be valuable in different fields. ChatGPT, developed by OpenAI, has been trained using massive amounts of data and simulates human conversation by comprehending context and generating appropriate responses. It has garnered significant attention due to its ability to effectively answer a broad range of human inquiries, with fluent and comprehensive answers surpassing prior public chatbots in both security and usefulness. However, a comprehensive analysis of ChatGPT's failures is lacking, which is the focus of this study. Eleven categories of failures, including reasoning, factual errors, math, coding, and bias, are presented and discussed. The risks, limitations, and societal implications of ChatGPT are also highlighted. The goal of this study is to assist researchers and developers in enhancing future language models and chatbots.
翻译:大规模语言模型已被证明在各个领域具有重要价值。OpenAI开发的ChatGPT通过海量数据训练,能够理解上下文并生成适当响应来模拟人类对话。由于其能够有效回答人类广泛提出的问题,生成的回答流畅全面且超越以往公共聊天机器人在安全性和实用性方面的表现,因而引起了广泛关注。然而,目前尚缺乏对ChatGPT失败案例的系统性分析——这正是本研究的重点。我们提出并讨论了包括推理错误、事实性错误、数学计算、编程问题及偏见在内的十一类失败案例,同时揭示了ChatGPT的风险、局限性及其社会影响。本研究旨在帮助研究人员和开发者改进未来的语言模型与聊天机器人。