The Pretrained Foundation Models (PFMs) are regarded as the foundation for various downstream tasks with different data modalities. A pretrained foundation model, such as BERT, GPT-3, MAE, DALLE-E, and ChatGPT, is trained on large-scale data which provides a reasonable parameter initialization for a wide range of downstream applications. The idea of pretraining behind PFMs plays an important role in the application of large models. Different from previous methods that apply convolution and recurrent modules for feature extractions, the generative pre-training (GPT) method applies Transformer as the feature extractor and is trained on large datasets with an autoregressive paradigm. Similarly, the BERT apples transformers to train on large datasets as a contextual language model. Recently, the ChatGPT shows promising success on large language models, which applies an autoregressive language model with zero shot or few show prompting. With the extraordinary success of PFMs, AI has made waves in a variety of fields over the past few years. Considerable methods, datasets, and evaluation metrics have been proposed in the literature, the need is raising for an updated survey. This study provides a comprehensive review of recent research advancements, current and future challenges, and opportunities for PFMs in text, image, graph, as well as other data modalities. We first review the basic components and existing pretraining in natural language processing, computer vision, and graph learning. We then discuss other advanced PFMs for other data modalities and unified PFMs considering the data quality and quantity. Besides, we discuss relevant research about the fundamentals of the PFM, including model efficiency and compression, security, and privacy. Finally, we lay out key implications, future research directions, challenges, and open problems.
翻译:预训练基础模型(PFMs)被视为支持多种数据模态下游任务的基础。诸如BERT、GPT-3、MAE、DALL-E和ChatGPT等预训练基础模型,通过大规模数据训练获得合理的参数初始化,从而适用于广泛的后续应用。PFM背后的预训练思想在大模型应用中扮演着重要角色。与以往采用卷积和循环模块进行特征提取的方法不同,生成式预训练(GPT)方法将Transformer作为特征提取器,并采用自回归范式在大规模数据集上进行训练。类似地,BERT也使用Transformer作为上下文语言模型,在大规模数据集上训练。近期,ChatGPT在大型语言模型上展现出巨大成功,它采用自回归语言模型结合零样本或少样本提示。随着PFM取得的卓越成就,人工智能在过去几年中在多个领域掀起浪潮。大量方法、数据集和评估指标已在文献中提出,亟需一篇更新的综述。本研究全面回顾了PFM在文本、图像、图结构及其他数据模态方面的最新研究进展、当前与未来挑战及机遇。我们首先梳理自然语言处理、计算机视觉和图学习中的基本组件与现有预训练方法,继而讨论面向其他数据模态的高级PFM,以及考虑数据质量与数量的统一PFM。此外,我们探讨了PFM基础相关研究,包括模型效率与压缩、安全性与隐私保护。最后,我们提出关键启示、未来研究方向、挑战与开放问题。