Language modeling studies the probability distributions over strings of texts. It is one of the most fundamental tasks in natural language processing (NLP). It has been widely used in text generation, speech recognition, machine translation, etc. Conventional language models (CLMs) aim to predict the probability of linguistic sequences in a causal manner, while pre-trained language models (PLMs) cover broader concepts and can be used in both causal sequential modeling and fine-tuning for downstream applications. PLMs have their own training paradigms (usually self-supervised) and serve as foundation models in modern NLP systems. This overview paper provides an introduction to both CLMs and PLMs from five aspects, i.e., linguistic units, architectures, training methods, evaluation methods, and applications. Furthermore, we discuss the relationship between CLMs and PLMs and shed light on the future directions of language modeling in the pre-trained era.
翻译:语言建模研究文本序列上的概率分布,是自然语言处理中最基础的任务之一,已广泛应用于文本生成、语音识别、机器翻译等领域。传统语言模型旨在以因果方式预测语言序列的概率,而预训练语言模型涵盖更广泛的概念,既可应用于因果序列建模,也可通过微调用于下游任务。预训练语言模型拥有独立的训练范式(通常为自监督学习),并作为现代自然语言处理系统的基石。本综述从语言单位、架构、训练方法、评估方法及应用五个方面对传统语言模型和预训练语言模型进行介绍,进一步探讨两者之间的关系,并对预训练时代语言建模的未来发展方向进行展望。