This tutorial on adaptation of LLMs is designed to address the growing demand for models that go beyond the static capabilities of generic LLMs by providing an overview of dynamic, domain-specific, and task-adaptive LLM adaptation techniques. While general LLMs have demonstrated strong generalization across a variety of tasks, they often struggle to perform well in specialized domains such as finance, healthcare, and code generation for underrepresented languages. Additionally, their static nature limits their ability to evolve with the changing world, and they are often extremely large in size, making them impractical and costly to deploy at scale. As a result, the adaptation of LLMs has drawn much attention since the birth of LLMs and is of core importance, both for industry, which focuses on serving its targeted users, and academia, which can greatly benefit from small but powerful LLMs. To address this gap, this tutorial aims to provide an overview of the LLM adaptation techniques. We start with an introduction to LLM adaptation, from both the data perspective and the model perspective. We then emphasize how the evaluation metrics and benchmarks are different from other techniques. After establishing the problems, we explore various adaptation techniques. We categorize adaptation techniques into two main families. The first is parametric knowledge adaptation, which focuses on updating the parametric knowledge within LLMs. Additionally, we will discuss real-time adaptation techniques, including model editing, which allows LLMs to be updated dynamically in production environments. The second kind of adaptation is semi-parametric knowledge adaptation, where the goal is to update LLM parameters to better leverage external knowledge or tools through techniques like retrieval-augmented generation (RAG) and agent-based systems.
翻译:本大语言模型适配教程旨在响应日益增长的需求,即超越通用大语言模型的静态能力,通过概述动态、领域特定及任务自适应的LLM适配技术来满足这一需求。尽管通用大语言模型已在多种任务上展现出强大的泛化能力,但在金融、医疗以及针对资源稀缺语言的代码生成等专业领域,其表现往往不尽如人意。此外,其静态特性限制了其随世界变化而演进的能力,且模型规模通常极为庞大,导致大规模部署既不切实际又成本高昂。因此,自大语言模型诞生以来,其适配技术便备受关注,并具有核心重要性——对于聚焦于服务目标用户的工业界,以及能够从小型但功能强大的LLM中极大受益的学术界而言皆是如此。为弥补这一差距,本教程旨在系统概述大语言模型的适配技术。我们首先从数据和模型两个视角介绍LLM适配的基本概念。随后,我们将重点阐述其评估指标与基准测试如何区别于其他技术。在明确问题定义后,我们将深入探讨多种适配技术。我们将适配技术主要分为两大类。第一类是参数化知识适配,其核心在于更新大语言模型内部的参数化知识。此外,我们将讨论实时适配技术,包括模型编辑,该技术使得LLM能够在生产环境中动态更新。第二类适配是半参数化知识适配,其目标是通过检索增强生成(RAG)和基于智能体的系统等技术,更新LLM参数以更好地利用外部知识或工具。