This technical report describes an experiment on autoregressive pre-training of Gemma2 2 billion parameter large language model (LLM) with 10\% on the Lithuanian language component of CulturaX from the point of view of continual learning. We apply elastic weight consolidation (EWC) to the full set of the model's parameters and investigate language understanding benchmarks, consisting of Arc, Belebele, Gsm8K, Hellaswag, MMLU, TruthfulQA, and Winogrande sets (both in English and Lithuanian versions), and perplexity benchmarks. We empirically demonstrate that EWC regularisation allows us not only to mitigate catastrophic forgetting effects but also that it is potentially beneficial for learning of the new task with LLMs.
翻译:本技术报告从持续学习的角度,描述了在CulturaX数据集的立陶宛语成分(占比10%)上对Gemma2 20亿参数大语言模型进行自回归预训练的实验。我们将弹性权重巩固方法应用于模型的全参数集,并考察了包括Arc、Belebele、Gsm8K、Hellaswag、MMLU、TruthfulQA和Winogrande(均含英语和立陶宛语版本)在内的语言理解基准测试以及困惑度基准。我们通过实证表明,EWC正则化不仅能有效缓解灾难性遗忘效应,而且可能对大语言模型学习新任务具有促进作用。