A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Ce Zhou,Qian Li,Chen Li,Jun Yu,Yixin Liu,Guangjing Wang,Kai Zhang,Cheng Ji,Qiben Yan,Lifang He,Hao Peng,Jianxin Li,Jia Wu,Ziwei Liu,Pengtao Xie,Caiming Xiong,Jian Pei,Philip S. Yu,Lichao Sun

from arxiv, 99 pages, 16 figures

Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A PFM (e.g., BERT, ChatGPT, and GPT-4) is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. BERT learns bidirectional encoder representations from Transformers, which are trained on large datasets as contextual language models. Similarly, the generative pretrained transformer (GPT) method employs Transformers as the feature extractor and is trained using an autoregressive paradigm on large datasets. Recently, ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few shot prompting. The remarkable achievements of PFM have brought significant breakthroughs to various fields of AI. Numerous studies have proposed different methods, raising the demand for an updated survey. This study provides a comprehensive review of recent research advancements, challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities. The review covers the basic components and existing pretraining methods used in natural language processing, computer vision, and graph learning. Additionally, it explores advanced PFMs used for different data modalities and unified PFMs that consider data quality and quantity. The review also discusses research related to the fundamentals of PFMs, such as model efficiency and compression, security, and privacy. Finally, the study provides key implications, future research directions, challenges, and open problems in the field of PFMs. Overall, this survey aims to shed light on the research of the PFMs on scalability, security, logical reasoning ability, cross-domain learning ability, and the user-friendly interactive ability for artificial general intelligence.

翻译：预训练基础模型（PFM）被视为支撑不同数据模态下多种下游任务的基础。诸如 BERT、ChatGPT 和 GPT-4 等 PFM，均在大型数据集上进行训练，为广泛的下游应用提供了合理的参数初始化。BERT 从 Transformer 中学习双向编码器表征，在大型数据集上作为上下文语言模型进行训练。类似地，生成式预训练 Transformer（GPT）方法使用 Transformer 作为特征提取器，并采用自回归范式在大型数据集上训练。近来，ChatGPT 在大语言模型领域展示出令人瞩目的成功，它应用了具备零样本或小样本提示能力的自回归语言模型。PFM 的卓越成就为人工智能的多个领域带来了重大突破。众多研究提出了不同的方法，催生了对最新综述的需求。本研究全面回顾了 PFM 在文本、图像、图以及其他数据模态方面的最新研究进展、挑战与机遇。综述涵盖了自然语言处理、计算机视觉和图学习中使用的基本组件及现有预训练方法。此外，探讨了用于不同数据模态的高级 PFM 以及综合考虑数据质量与数量的统一 PFM。综述还讨论了与 PFM 基础相关的研究，如模型效率与压缩、安全性和隐私性。最后，本研究指出了 PFM 领域的关键启示、未来研究方向、挑战及有待解决的问题。总体而言，本综述旨在阐明 PFM 在可扩展性、安全性、逻辑推理能力、跨域学习能力以及面向通用人工智能的用户友好交互能力等方面的研究。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日