We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many real-world scenarios, GPT-4 exhibits human-level performance on various professional and academic benchmarks, including passing a simulated bar exam with a score around the top 10% of test takers. GPT-4 is a Transformer-based model pre-trained to predict the next token in a document. The post-training alignment process results in improved performance on measures of factuality and adherence to desired behavior. A core component of this project was developing infrastructure and optimization methods that behave predictably across a wide range of scales. This allowed us to accurately predict some aspects of GPT-4's performance based on models trained with no more than 1/1,000th the compute of GPT-4.
翻译:我们报告了GPT-4的开发情况,这是一种能够接受图像和文本输入并生成文本输出的大规模多模态模型。虽然在许多现实场景中能力不及人类,但GPT-4在各种专业和学术基准测试中展现出与人类相当的水平,包括在模拟律师资格考试中取得约前10%考生的成绩。GPT-4是基于Transformer架构的预训练模型,其训练目标为预测文档中的下一个词元。后训练对齐过程在事实性及行为合规性指标上取得了更优表现。本项目的核心组成部分是构建可在广泛规模下稳定运行的底层基础设施与优化方法,这使得我们得以基于计算量不超过GPT-4千分之一的模型,准确预测GPT-4某些方面的性能表现。