We propose TCSP, a novel method for compressing a transformer model by focusing on reducing the hidden size of the model. By projecting the whole transform model into a subspace, we enable matrix operations between the weight matrices in the model and features in a reduced-dimensional space, leading to significant reductions in model parameters and computing resources. To establish this subspace, we decompose the feature matrix, derived from different layers of sampled data instances, into a projection matrix. For evaluation, TCSP is applied to compress T5 and BERT models on the GLUE and SQuAD benchmarks. Experimental results demonstrate that TCSP achieves a compression ratio of 44\% with at most 1.6\% degradation in accuracy, surpassing or matching prior compression methods. Furthermore, TCSP exhibits compatibility with other methods targeting filter and attention head size compression.
翻译:我们提出TCSP方法,一种通过聚焦于降低模型隐藏层维度来实现Transformer模型压缩的新技术。通过将整个Transformer模型投影至子空间,我们使模型中的权重矩阵与降维空间中的特征矩阵之间可以进行矩阵运算,从而显著减少模型参数与计算资源消耗。为构建该子空间,我们将从不同层采样数据实例中提取的特征矩阵分解为投影矩阵。在评估阶段,TCSP被应用于压缩基于GLUE与SQuAD基准的T5和BERT模型。实验结果表明,TCSP在精度损失不超过1.6%的情况下实现了44%的压缩比,超越或持平于现有压缩方法。此外,TCSP展现出与其他面向滤波器及注意力头维度压缩方法的良好兼容性。