Sparks of Artificial General Intelligence: Early experiments with GPT-4

Sébastien Bubeck,Varun Chandrasekaran,Ronen Eldan,Johannes Gehrke,Eric Horvitz,Ece Kamar,Peter Lee,Yin Tat Lee,Yuanzhi Li,Scott Lundberg,Harsha Nori,Hamid Palangi,Marco Tulio Ribeiro,Yi Zhang

Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.

翻译：人工智能（AI）研究者们一直在开发并精进具备卓越能力的大型语言模型（LLMs），这些模型在多个领域与任务中展现出非凡表现，挑战了我们对学习与认知的理解。OpenAI开发的最新模型GPT-4，采用了前所未有的计算规模和数据进行训练。本文报告了我们对GPT-4早期版本（当时仍在OpenAI积极开发中）的研究。我们认为，（这一早期版本的）GPT-4属于新一代大型语言模型（例如，与ChatGPT和谷歌PaLM并列）的成员，其展现出比以往AI模型更通用的智能。我们探讨了这些模型日益增强的能力及其影响。研究表明，除了掌握语言之外，GPT-4无需任何特殊提示即可解决涵盖数学、编码、视觉、医学、法律、心理学等众多领域的新颖且困难的任务。此外，在所有这些任务中，GPT-4的表现都惊人地接近人类水平，并且往往远超先前模型（如ChatGPT）。鉴于GPT-4能力的广度与深度，我们有理由将其视为人工通用智能（AGI）系统的早期（尽管仍不完善）版本。在探索GPT-4的过程中，我们特别关注其局限性，并讨论了向更深入、更全面的AGI版本迈进所面临的挑战（包括可能需追求超越下一词预测的新范式）。最后，我们反思了近期技术飞跃的社会影响及未来研究方向。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

从ChatGPT看AI未来趋势和挑战 | 万字长文

专知会员服务

174+阅读 · 2023年4月18日

GPT-4在医学上能力如何？微软OpenAI《GPT-4在医疗难题上的能力》论文

专知会员服务

115+阅读 · 2023年3月24日