Artificial intelligence (AI) researchers have been developing and refining large language models (LLMs) that exhibit remarkable capabilities across a variety of domains and tasks, challenging our understanding of learning and cognition. The latest model developed by OpenAI, GPT-4, was trained using an unprecedented scale of compute and data. In this paper, we report on our investigation of an early version of GPT-4, when it was still in active development by OpenAI. We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models. We discuss the rising capabilities and implications of these models. We demonstrate that, beyond its mastery of language, GPT-4 can solve novel and difficult tasks that span mathematics, coding, vision, medicine, law, psychology and more, without needing any special prompting. Moreover, in all of these tasks, GPT-4's performance is strikingly close to human-level performance, and often vastly surpasses prior models such as ChatGPT. Given the breadth and depth of GPT-4's capabilities, we believe that it could reasonably be viewed as an early (yet still incomplete) version of an artificial general intelligence (AGI) system. In our exploration of GPT-4, we put special emphasis on discovering its limitations, and we discuss the challenges ahead for advancing towards deeper and more comprehensive versions of AGI, including the possible need for pursuing a new paradigm that moves beyond next-word prediction. We conclude with reflections on societal influences of the recent technological leap and future research directions.
翻译:人工智能研究者不断开发并完善大语言模型,这些模型在多种领域和任务中展现出卓越能力,挑战了我们对学习与认知的理解。OpenAI最新开发的GPT-4模型,以前所未有的计算规模和数据量进行训练。本文报告了我们对GPT-4早期版本(当时仍在OpenAI积极开发中)的研究成果。我们认为(该早期版本的)GPT-4属于新一代大语言模型(例如ChatGPT和Google的PaLM),其具备比以往AI模型更通用的智能。我们探讨了这些模型日益增长的能力及其影响。研究表明,除了对语言的精通,GPT-4无需任何特殊提示即可解决涉及数学、编程、视觉、医学、法律、心理学等领域的新颖且困难的任务。此外,在所有这些任务中,GPT-4的表现惊人地接近人类水平,且通常远超ChatGPT等先前模型。鉴于GPT-4能力的广度与深度,我们认为它可被合理视为早期(尚不完整)的人工通用智能系统。在探索GPT-4时,我们特别强调发现其局限性,并探讨了迈向更深入、更全面的人工通用智能所面临的挑战,包括可能需探索超越下一词预测的新范式。最后,我们对近期技术突破的社会影响及未来研究方向进行了反思。