This is a further development of Vision Transformer Pruning via matrix decomposition. The purpose of the Vision Transformer Pruning is to prune the dimension of the linear projection of the dataset by learning their associated importance score in order to reduce the storage, run-time memory, and computational demands. In this paper we further reduce dimension and complexity of the linear projection by implementing and comparing several matrix decomposition methods while preserving the generated important features. We end up selected the Singular Value Decomposition as the method to achieve our goal by comparing the original accuracy scores in the original Github repository and the accuracy scores of using those matrix decomposition methods, including Singular Value Decomposition, four versions of QR Decomposition, and LU factorization.
翻译:本文是对基于矩阵分解的Vision Transformer剪枝方法的进一步发展。Vision Transformer剪枝的目的是通过学习线性投影中各个维度的重要性评分,对数据集的线性投影维度进行剪枝,从而降低存储需求、运行时内存占用和计算开销。本文进一步通过实现并比较多种矩阵分解方法,在保留生成的重要特征的同时,降低线性投影的维度和复杂度。通过对比原始GitHub仓库中的原始准确率分数与使用这些矩阵分解方法(包括奇异值分解、四种版本的QR分解以及LU分解)后的准确率分数,我们最终选择奇异值分解作为实现目标的方法。