Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive amounts of text data, as predicted by scaling laws \cite{kaplan2020scaling,hoffmann2022training}. The research area of LLMs, while very recent, is evolving rapidly in many different ways. In this paper, we review some of the most prominent LLMs, including three popular LLM families (GPT, LLaMA, PaLM), and discuss their characteristics, contributions and limitations. We also give an overview of techniques developed to build, and augment LLMs. We then survey popular datasets prepared for LLM training, fine-tuning, and evaluation, review widely used LLM evaluation metrics, and compare the performance of several popular LLMs on a set of representative benchmarks. Finally, we conclude the paper by discussing open challenges and future research directions.
翻译:自2022年11月ChatGPT发布以来,大型语言模型(LLMs)因其在广泛自然语言任务上的卓越表现而备受关注。LLMs的通用语言理解与生成能力,是通过在大量文本数据上训练数十亿模型参数获得的,这符合缩放定律(scaling laws)的预测\cite{kaplan2020scaling,hoffmann2022training}。尽管LLM研究领域尚处于起步阶段,但其正以多种方式快速演进。本文回顾了若干最具代表性的LLMs,包括三个主流LLM系列(GPT、LLaMA、PaLM),并讨论了它们的特性、贡献与局限性。我们还概述了构建和增强LLMs的技术方法,梳理了用于LLM训练、微调和评估的常用数据集,综述了广泛使用的LLM评估指标,并在若干代表性基准上比较了多个主流LLMs的性能。最后,本文总结了开放挑战与未来研究方向。