Large Language Models (LLMs) have achieved remarkable success in various natural language processing tasks, including language modeling, understanding, and generation. However, the increased memory and computational costs associated with these models pose significant challenges for deployment on resource-limited devices. Structural pruning has emerged as a promising solution to reduce the costs of LLMs without requiring post-processing steps. Prior structural pruning methods either follow the dependence of structures at the cost of limiting flexibility, or introduce non-trivial additional parameters by incorporating different projection matrices. In this work, we propose a novel approach that relaxes the constraint imposed by regular structural pruning methods and eliminates the structural dependence along the embedding dimension. Our dimension-independent structural pruning method offers several benefits. Firstly, our method enables different blocks to utilize different subsets of the feature maps. Secondly, by removing structural dependence, we facilitate each block to possess varying widths along its input and output dimensions, thereby significantly enhancing the flexibility of structural pruning. We evaluate our method on various LLMs, including OPT, LLaMA, LLaMA-2, Phi-1.5, and Phi-2. Experimental results demonstrate that our approach outperforms other state-of-the-art methods, showing for the first time that structural pruning can achieve an accuracy similar to semi-structural pruning.
翻译:大语言模型(LLMs)在语言建模、理解和生成等多种自然语言处理任务中取得了显著成功。然而,这些模型伴随的内存和计算成本增加,对在资源受限设备上的部署构成了重大挑战。结构化剪枝已成为一种有前景的解决方案,无需后处理步骤即可降低LLMs的成本。现有的结构化剪枝方法要么以牺牲灵活性为代价遵循结构间的依赖性,要么通过引入不同的投影矩阵而带来显著的额外参数。在本工作中,我们提出了一种新颖的方法,该方法放宽了常规结构化剪枝方法施加的约束,并消除了沿嵌入维度的结构依赖性。我们的维度无关结构化剪枝方法具有若干优势。首先,我们的方法使得不同模块能够利用特征图的不同子集。其次,通过移除结构依赖性,我们促使每个模块在其输入和输出维度上具有不同的宽度,从而显著增强了结构化剪枝的灵活性。我们在多种LLMs上评估了我们的方法,包括OPT、LLaMA、LLaMA-2、Phi-1.5和Phi-2。实验结果表明,我们的方法优于其他最先进的方法,首次证明了结构化剪枝可以达到与半结构化剪枝相似的精度。