Conventional LLMs may suffer from corpus heterogeneity and subtle condition changes. While finetuning can create the catastrophe forgetting issue, application of meta-learning on LLMs is also limited due to its complexity and scalability. In this paper, we activate the meta-signal of $β$ within the SwiGLU blocks, resulting in a meta-gating mechanism that adaptively adjusts the nonlinearity of FFN. A hypernetwork is employed which dynamically produces $β$ on textual conditions, providing meta-controllability on LLMs. By testing on different condition types such as task, domain, persona, and style, our method outperforms finetuning and meta-learning baselines, and can generalize reasonably on unseen tasks, condition types, or instructions. Our code can be found in https://github.com/AaronJi/MeGan.
翻译:传统大语言模型可能受到语料异质性和细微条件变化的困扰。虽然微调可能导致灾难性遗忘问题,但元学习在大语言模型上的应用也因其复杂性和可扩展性而受限。本文在SwiGLU模块中激活元信号$β$,形成一种自适应调整前馈神经网络非线性的元门控机制。我们采用超网络动态生成基于文本条件的$β$,为LLMs提供了元可控性。通过在任务、领域、角色和风格等不同类型的条件上进行测试,我们的方法优于微调和元学习基线,并能对未见任务、条件类型或指令进行合理泛化。我们的代码可在https://github.com/AaronJi/MeGan获取。