Knowledge Distillation (KD) has emerged as a powerful technique for model compression, enabling lightweight student networks to benefit from the performance of redundant teacher networks. However, the inherent capacity gap often limits the performance of student networks. Inspired by the expressiveness of pretrained teacher networks, a compelling research question arises: is there a type of network that can not only inherit the teacher's structure but also maximize the inheritance of its knowledge? Furthermore, how does the performance of such an inheriting network compare to that of student networks, all benefiting from the same teacher network? To further explore this question, we propose InherNet, a neural network inheritance method that performs asymmetric low-rank decomposition on the teacher's weights and reconstructs a lightweight yet expressive network without significant architectural disruption. By leveraging Singular Value Decomposition (SVD) for initialization to ensure the inheritance of principal knowledge, InherNet effectively balances depth, width, and compression efficiency. Experimental results across unimodal and multimodal tasks demonstrate that InherNet achieves higher performance compared to student networks of similar parameter sizes. Our findings reveal a promising direction for future research in efficient model compression beyond traditional distillation.
翻译:知识蒸馏(KD)已成为一种强大的模型压缩技术,使轻量级学生网络能够从冗余教师网络的性能中受益。然而,固有的容量差距常常限制了学生网络的性能。受预训练教师网络表达能力的启发,一个引人注目的研究问题随之产生:是否存在一种网络,不仅能继承教师的结构,还能最大化地继承其知识?此外,这种继承网络的性能与受益于同一教师网络的学生网络相比如何?为了进一步探索这一问题,我们提出了InherNet,一种神经网络继承方法,该方法对教师网络的权重进行非对称低秩分解,并在不显著破坏架构的情况下重建一个轻量级且表达能力强的网络。通过利用奇异值分解(SVD)进行初始化以确保继承主要知识,InherNet有效地平衡了深度、宽度和压缩效率。在单模态和多模态任务上的实验结果表明,与参数规模相似的学生网络相比,InherNet实现了更高的性能。我们的发现揭示了超越传统蒸馏的高效模型压缩未来研究的一个有前景的方向。