Revisiting Data Augmentation in Model Compression: An Empirical and Comprehensive Study

The excellent performance of deep neural networks is usually accompanied by a large number of parameters and computations, which have limited their usage on the resource-limited edge devices. To address this issue, abundant methods such as pruning, quantization and knowledge distillation have been proposed to compress neural networks and achieved significant breakthroughs. However, most of these compression methods focus on the architecture or the training method of neural networks but ignore the influence from data augmentation. In this paper, we revisit the usage of data augmentation in model compression and give a comprehensive study on the relation between model sizes and their optimal data augmentation policy. To sum up, we mainly have the following three observations: (A) Models in different sizes prefer data augmentation with different magnitudes. Hence, in iterative pruning, data augmentation with varying magnitudes leads to better performance than data augmentation with a consistent magnitude. (B) Data augmentation with a high magnitude may significantly improve the performance of large models but harm the performance of small models. Fortunately, small models can still benefit from strong data augmentations by firstly learning them with "additional parameters" and then discard these "additional parameters" during inference. (C) The prediction of a pre-trained large model can be utilized to measure the difficulty of data augmentation. Thus it can be utilized as a criterion to design better data augmentation policies. We hope this paper may promote more research on the usage of data augmentation in model compression.

翻译：深度神经网络的卓越性能通常伴随着大量的参数和计算量，这限制了它们在资源受限的边缘设备上的应用。为解决这一问题，已提出剪枝、量化和知识蒸馏等多种方法用于压缩神经网络，并取得了显著突破。然而，大多数压缩方法关注神经网络的架构或训练方法，却忽略了数据增强的影响。本文中，我们重新审视数据增强在模型压缩中的应用，并对模型规模与其最优数据增强策略之间的关系进行了综合研究。总结而言，我们主要有以下三个发现：(A) 不同规模的模型倾向于使用不同强度的数据增强。因此，在迭代剪枝中，使用变化强度的数据增强比恒定强度的数据增强效果更优。(B) 高强度数据增强可能显著提升大模型性能，但会损害小模型性能。幸运的是，小模型仍可从强数据增强中获益，方法是在训练时先通过“额外参数”学习这些增强，然后在推理阶段丢弃这些“额外参数”。(C) 预训练大模型的预测可用于衡量数据增强的难度，因此可将其作为设计更优数据增强策略的准则。我们希望本文能推动更多关于数据增强在模型压缩中应用的研究。