Large language models have become one of the most commonly deployed NLP inventions. In the past half-decade, their integration into core natural language processing tools has dramatically increased the performance of such tools, and they have entered the public discourse surrounding artificial intelligence. Consequently, it is important for both developers and researchers alike to understand the mathematical foundations of large language models, as well as how to implement them. These notes are the accompaniment to the theoretical portion of the ETH Z\"urich course on large language models, covering what constitutes a language model from a formal, theoretical perspective.
翻译:大型语言模型已成为最常部署的自然语言处理发明之一。在过去五年中,它们与核心自然语言处理工具的融合显著提升了这些工具的性能,并已进入围绕人工智能的公共讨论。因此,开发者和研究者理解大型语言模型的数学基础及其实现方法都至关重要。本文是苏黎世联邦理工学院大型语言模型课程理论部分的配套资料,从形式化、理论化的视角阐述了语言模型的构成要素。