Large Language Models~(LLMs) have demonstrated incredible capabilities in understanding, generating, and manipulating languages. Through human-model interactions, LLMs can automatically understand human-issued instructions and output the expected contents, which can significantly increase working efficiency. In various types of real-world demands, editing-oriented tasks account for a considerable proportion, which involves an interactive process that entails the continuous refinement of existing texts to meet specific criteria. Due to the need for multi-round human-model interaction and the generation of complicated editing tasks, there is an emergent need for efficient general editing models. In this paper, we propose \underline{\textbf{G}}eneral \underline{\textbf{SP}}arse \underline{\textbf{E}}fficient \underline{\textbf{E}}diting Mo\underline{\textbf{D}}el~(\textbf{G-SPEED}), which can fulfill diverse editing requirements through a single model while maintaining low computational costs. Specifically, we first propose a novel unsupervised text editing data clustering algorithm to deal with the data scarcity problem. Subsequently, we introduce a sparse editing model architecture to mitigate the inherently limited learning capabilities of small language models. The experimental outcomes indicate that G-SPEED, with its 508M parameters, can surpass LLMs equipped with 175B parameters. Our code and model checkpoints are available at \url{https://github.com/Banner-Z/G-SPEED}.
翻译:大型语言模型(LLMs)在理解、生成和操控语言方面展现出了惊人的能力。通过人机交互,LLMs能够自动理解人类发出的指令并输出期望内容,从而显著提升工作效率。在实际需求中,编辑导向任务占据相当比例,这类任务涉及对现有文本进行持续改进以满足特定标准的交互过程。由于需要多轮人机交互并处理复杂的编辑任务,对高效通用编辑模型的需求日益迫切。本文提出了通用稀疏高效编辑模型(G-SPEED),该模型通过单一模型即可满足多样化的编辑需求,同时保持较低的计算成本。具体而言,我们首先提出了一种新颖的无监督文本编辑数据聚类算法,以应对数据稀缺问题。随后,我们引入了一种稀疏编辑模型架构,以缓解小语言模型固有的学习能力限制。实验结果表明,拥有5.08亿参数的G-SPEED能够超越配备1750亿参数的LLMs。我们的代码和模型检查点已开源在\url{https://github.com/Banner-Z/G-SPEED}。