A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT

Ce Zhou,Qian Li,Chen Li,Jun Yu,Yixin Liu,Guangjing Wang,Kai Zhang,Cheng Ji,Qiben Yan,Lifang He,Hao Peng,Jianxin Li,Jia Wu,Ziwei Liu,Pengtao Xie,Caiming Xiong,Jian Pei,Philip S. Yu,Lichao Sun

from arxiv, 97 pages, 16 figures

The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. The idea of pretraining behind PFMs plays an important role in the application of large models. Different from previous methods that apply convolution and recurrent modules for feature extractions, the generative pre-training (GPT) method applies Transformer as the feature extractor and is trained on large datasets with an autoregressive paradigm. Similarly, the BERT apples transformers to train on large datasets as a contextual language model. Recently, the ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few show prompting. With the extraordinary success of PFMs, AI has made waves in a variety of fields over the past few years. Considerable methods, datasets, and evaluation metrics have been proposed in the literature, the need is raising for an updated survey. This study provides a comprehensive review of recent research advancements, current and future challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities. We first review the basic components and existing pretraining in natural language processing, computer vision, and graph learning. We then discuss other advanced PFMs for other data modalities and unified PFMs considering the data quality and quantity. Besides, we discuss relevant research about the fundamentals of the PFM, including model efficiency and compression, security, and privacy. Finally, we lay out key implications, future research directions, challenges, and open problems.

翻译：预训练基础模型（PFMs）被视为支持多种数据模态下游任务的基础。诸如BERT、GPT-3、MAE、DALL-E和ChatGPT等预训练基础模型，通过大规模数据训练获得合理的参数初始化，从而适用于广泛的后续应用。PFM背后的预训练思想在大模型应用中扮演着重要角色。与以往采用卷积和循环模块进行特征提取的方法不同，生成式预训练（GPT）方法将Transformer作为特征提取器，并采用自回归范式在大规模数据集上进行训练。类似地，BERT也使用Transformer作为上下文语言模型，在大规模数据集上训练。近期，ChatGPT在大型语言模型上展现出巨大成功，它采用自回归语言模型结合零样本或少样本提示。随着PFM取得的卓越成就，人工智能在过去几年中在多个领域掀起浪潮。大量方法、数据集和评估指标已在文献中提出，亟需一篇更新的综述。本研究全面回顾了PFM在文本、图像、图结构及其他数据模态方面的最新研究进展、当前与未来挑战及机遇。我们首先梳理自然语言处理、计算机视觉和图学习中的基本组件与现有预训练方法，继而讨论面向其他数据模态的高级PFM，以及考虑数据质量与数量的统一PFM。此外，我们探讨了PFM基础相关研究，包括模型效率与压缩、安全性与隐私保护。最后，我们提出关键启示、未来研究方向、挑战与开放问题。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【如何做研究】How to research ，22页ppt

专知会员服务

114+阅读 · 2021年4月17日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日