Convolutional neural networks (CNNs) represent one of the most widely used neural network architectures, showcasing state-of-the-art performance in computer vision tasks. Although larger CNNs generally exhibit higher accuracy, their size can be effectively reduced by "tensorization" while maintaining accuracy. Tensorization consists of replacing the convolution kernels with compact decompositions such as Tucker, Canonical Polyadic decompositions, or quantum-inspired decompositions such as matrix product states, and directly training the factors in the decompositions to bias the learning towards low-rank decompositions. But why doesn't tensorization seem to impact the accuracy adversely? We explore this by assessing how truncating the convolution kernels of dense (untensorized) CNNs impact their accuracy. Specifically, we truncated the kernels of (i) a vanilla four-layer CNN and (ii) ResNet-50 pre-trained for image classification on CIFAR-10 and CIFAR-100 datasets. We found that kernels (especially those inside deeper layers) could often be truncated along several cuts resulting in significant loss in kernel norm but not in classification accuracy. This suggests that such ``correlation compression'' (underlying tensorization) is an intrinsic feature of how information is encoded in dense CNNs. We also found that aggressively truncated models could often recover the pre-truncation accuracy after only a few epochs of re-training, suggesting that compressing the internal correlations of convolution layers does not often transport the model to a worse minimum. Our results can be applied to tensorize and compress CNN models more effectively.
翻译:卷积神经网络(CNN)是应用最广泛的神经网络架构之一,在计算机视觉任务中展现出最先进的性能。尽管规模更大的CNN通常具有更高准确率,但通过"张量化"可在保持准确率的同时有效缩小模型规模。张量化是指用紧凑分解(如Tucker分解、典范多元分解或矩阵乘积态等量子启发性分解)替代卷积核,并直接训练分解因子以促使学习过程偏向低秩分解。但为何张量化似乎不会对准确率产生负面影响?本文通过评估密集(非张量化)CNN卷积核截断对准确率的影响来探究这一问题。具体而言,我们对(i)用于CIFAR-10和CIFAR-100数据集图像分类的普通四层CNN与(ii)预训练ResNet-50模型进行卷积核截断实验。研究发现,卷积核(特别是深层网络中的卷积核)常可沿多条截断线进行截断,虽然导致核范数显著降低,但分类准确率并未下降。这表明此类"相关性压缩"(即张量化的基础机制)是密集CNN信息编码的内在特征。我们还发现,经过激进截断的模型仅需少量epoch的重训练即可恢复截断前的准确率,说明压缩卷积层内部相关性通常不会将模型推向更差的极小值点。本研究成果可更有效地指导CNN模型的张量化与压缩。