面向低资源任务的领域自适应持续学习：以尼泊尔语为例的评估 (Domain-adaptative Continual Learning for Low-resource Tasks: Evaluation on Nepali)

Continual learning has emerged as an important research direction due to the infeasibility of retraining large language models (LLMs) from scratch in the event of new data availability. Of great interest is the domain-adaptive pre-training (DAPT) paradigm, which focuses on continually training a pre-trained language model to adapt it to a domain it was not originally trained on. In this work, we evaluate the feasibility of DAPT in a low-resource setting, namely the Nepali language. We use synthetic data to continue training Llama 3 8B to adapt it to the Nepali language in a 4-bit QLoRA setting. We evaluate the adapted model on its performance, forgetting, and knowledge acquisition. We compare the base model and the final model on their Nepali generation abilities, their performance on popular benchmarks, and run case-studies to probe their linguistic knowledge in Nepali. We see some unsurprising forgetting in the final model, but also surprisingly find that increasing the number of shots during evaluation yields better percent increases in the final model (as high as 19.29% increase) compared to the base model (4.98%), suggesting latent retention. We also explore layer-head self-attention heatmaps to establish dependency resolution abilities of the final model in Nepali.

翻译：持续学习已成为重要的研究方向，因为在新数据可用时从头重新训练大型语言模型（LLMs）并不可行。领域自适应预训练（DAPT）范式尤其受到关注，其重点在于持续训练预训练语言模型，使其适应原本未经训练的领域。本工作评估了DAPT在低资源场景（即尼泊尔语）中的可行性。我们使用合成数据在4位QLoRA设置下持续训练Llama 3 8B模型，使其适应尼泊尔语。我们从性能、遗忘性和知识获取三个维度评估适应后的模型。我们比较了基础模型与最终模型在尼泊尔语生成能力、常见基准测试上的表现，并通过案例研究探究其尼泊尔语语言学知识。最终模型出现了一些意料之中的遗忘现象，但意外地发现：与基础模型（4.98%）相比，最终模型在评估时增加样本数量能获得更高的百分比提升（最高达19.29%），这暗示了潜在的记忆保持能力。我们还通过层-头自注意力热力图分析，验证了最终模型在尼泊尔语中的依存解析能力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

31+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日