Conventional LLMs may suffer from corpus heterogeneity and subtle condition changes. While finetuning can create the catastrophe forgetting issue, application of meta-learning on LLMs is also limited due to its complexity and scalability. In this paper, we activate the meta-signal of $β$ within the SwiGLU blocks, resulting in a meta-gating mechanism that adaptively adjusts the nonlinearity of FFN. A hypernetwork is employed which dynamically produces $β$ on textual conditions, providing meta-controllability on LLMs. By testing on different condition types such as task, domain, persona, and style, our method outperforms finetuning and meta-learning baselines, and can generalize reasonably on unseen tasks, condition types, or instructions. Our code can be found in https://github.com/AaronJi/MeGan.
翻译:传统大语言模型可能受限于语料异质性和细微条件变化。微调可能导致灾难性遗忘问题,而元学习在大语言模型上的应用也因其复杂性和可扩展性而受限。本文在SwiGLU块内激活了$β$元信号,形成一种自适应调整前馈网络非线性的元门控机制。采用超网络动态生成基于文本条件的$β$,为大语言模型提供了元可控性。通过在任务、领域、角色和风格等不同条件类型上的测试,我们的方法优于微调和元学习基线,并能合理泛化到未见任务、条件类型或指令。我们的代码可在https://github.com/AaronJi/MeGan获取。