NAACL2025教程：大语言模型的适配技术 (NAACL2025 Tutorial: Adaptation of Large Language Models)

This tutorial on adaptation of LLMs is designed to address the growing demand for models that go beyond the static capabilities of generic LLMs by providing an overview of dynamic, domain-specific, and task-adaptive LLM adaptation techniques. While general LLMs have demonstrated strong generalization across a variety of tasks, they often struggle to perform well in specialized domains such as finance, healthcare, and code generation for underrepresented languages. Additionally, their static nature limits their ability to evolve with the changing world, and they are often extremely large in size, making them impractical and costly to deploy at scale. As a result, the adaptation of LLMs has drawn much attention since the birth of LLMs and is of core importance, both for industry, which focuses on serving its targeted users, and academia, which can greatly benefit from small but powerful LLMs. To address this gap, this tutorial aims to provide an overview of the LLM adaptation techniques. We start with an introduction to LLM adaptation, from both the data perspective and the model perspective. We then emphasize how the evaluation metrics and benchmarks are different from other techniques. After establishing the problems, we explore various adaptation techniques. We categorize adaptation techniques into two main families. The first is parametric knowledge adaptation, which focuses on updating the parametric knowledge within LLMs. Additionally, we will discuss real-time adaptation techniques, including model editing, which allows LLMs to be updated dynamically in production environments. The second kind of adaptation is semi-parametric knowledge adaptation, where the goal is to update LLM parameters to better leverage external knowledge or tools through techniques like retrieval-augmented generation (RAG) and agent-based systems.

翻译：本大语言模型适配教程旨在响应日益增长的需求，即超越通用大语言模型的静态能力，通过概述动态、领域特定及任务自适应的LLM适配技术来满足这一需求。尽管通用大语言模型已在多种任务上展现出强大的泛化能力，但在金融、医疗以及针对资源稀缺语言的代码生成等专业领域，其表现往往不尽如人意。此外，其静态特性限制了其随世界变化而演进的能力，且模型规模通常极为庞大，导致大规模部署既不切实际又成本高昂。因此，自大语言模型诞生以来，其适配技术便备受关注，并具有核心重要性——对于聚焦于服务目标用户的工业界，以及能够从小型但功能强大的LLM中极大受益的学术界而言皆是如此。为弥补这一差距，本教程旨在系统概述大语言模型的适配技术。我们首先从数据和模型两个视角介绍LLM适配的基本概念。随后，我们将重点阐述其评估指标与基准测试如何区别于其他技术。在明确问题定义后，我们将深入探讨多种适配技术。我们将适配技术主要分为两大类。第一类是参数化知识适配，其核心在于更新大语言模型内部的参数化知识。此外，我们将讨论实时适配技术，包括模型编辑，该技术使得LLM能够在生产环境中动态更新。第二类适配是半参数化知识适配，其目标是通过检索增强生成（RAG）和基于智能体的系统等技术，更新LLM参数以更好地利用外部知识或工具。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日