Tensor decompositions have been successfully applied to compress neural networks. The compression algorithms using tensor decompositions commonly minimize the approximation error on the weights. Recent work assumes the approximation error on the weights is a proxy for the performance of the model to compress multiple layers and fine-tune the compressed model. Surprisingly, little research has systematically evaluated which approximation errors can be used to make choices regarding the layer, tensor decomposition method, and level of compression. To close this gap, we perform an experimental study to test if this assumption holds across different layers and types of decompositions, and what the effect of fine-tuning is. We include the approximation error on the features resulting from a compressed layer in our analysis to test if this provides a better proxy, as it explicitly takes the data into account. We find the approximation error on the weights has a positive correlation with the performance error, before as well as after fine-tuning. Basing the approximation error on the features does not improve the correlation significantly. While scaling the approximation error commonly is used to account for the different sizes of layers, the average correlation across layers is smaller than across all choices (i.e. layers, decompositions, and level of compression) before fine-tuning. When calculating the correlation across the different decompositions, the average rank correlation is larger than across all choices. This means multiple decompositions can be considered for compression and the approximation error can be used to choose between them.
翻译:张量分解已成功应用于神经网络压缩。使用张量分解的压缩算法通常最小化权重的近似误差。近期研究假设权重的近似误差可作为模型性能的代理指标,用于压缩多层网络并微调压缩后的模型。令人惊讶的是,鲜有研究系统评估哪些近似误差可用于决定层选择、张量分解方法及压缩程度。为填补这一空白,我们开展实验研究,检验该假设在不同层与分解类型中是否成立,并考察微调的影响。我们在分析中纳入压缩层所产生特征的近似误差,以测试其能否成为更优的代理指标——因其显式考虑了数据因素。研究发现,权重近似误差与性能误差在微调前后均呈正相关。基于特征的近似误差并未显著提升相关性。尽管常用缩放近似误差来应对不同层大小的差异,但微调前跨层的平均相关系数小于跨所有选择(即层、分解方法和压缩程度)的相关系数。当计算不同分解方法间的相关性时,平均秩相关系数大于跨所有选择的值。这意味着多种分解方法均可用于压缩,且近似误差可在其中进行选择。