Recently, Large Language Models (LLMs) have achieved amazing zero-shot learning performance over a variety of Natural Language Processing (NLP) tasks, especially for text generative tasks. Yet, the large size of LLMs often leads to the high computational cost of model training and online deployment. In our work, we present ALTER, a system that effectively builds the multi-tAsk Learners with mixTure-of-task-adaptERs upon small language models (with <1B parameters) to address multiple NLP tasks simultaneously, capturing the commonalities and differences between tasks, in order to support domain-specific applications. Specifically, in ALTER, we propose the Mixture-of-Task-Adapters (MTA) module as an extension to the transformer architecture for the underlying model to capture the intra-task and inter-task knowledge. A two-stage training method is further proposed to optimize the collaboration between adapters at a small computational cost. Experimental results over a mixture of NLP tasks show that our proposed MTA architecture and the two-stage training method achieve good performance. Based on ALTER, we have also produced MTA-equipped language models for various domains.
翻译:近期,大型语言模型(LLMs)在各类自然语言处理(NLP)任务中展现出惊人的零样本学习能力,尤其在文本生成任务上表现突出。然而,LLMs的大规模参数往往导致模型训练和在线部署的高昂计算成本。在本研究中,我们提出ALTER系统,该系统通过在小语言模型(参数低于1B)上有效构建基于混合任务适配器的多任务学习器,能够同时处理多个NLP任务,捕捉任务间的共性与差异,以支持领域特定应用。具体而言,ALTER中提出的混合任务适配器(MTA)模块作为Transformer架构的扩展,使底层模型能够学习任务内与任务间的知识。我们进一步提出两阶段训练方法,以较低的计算成本优化适配器之间的协作。在混合NLP任务上的实验结果表明,所提出的MTA架构与两阶段训练方法取得了良好性能。基于ALTER,我们已为多个领域生成了配备MTA的语言模型。