Algorithmic Simplification of Neural Networks with Mosaic-of-Motifs

Large-scale deep learning models are well-suited for compression. Methods like pruning, quantization, and knowledge distillation have been used to achieve massive reductions in the number of model parameters, with marginal performance drops across a variety of architectures and tasks. This raises the central question: \emph{Why are deep neural networks suited for compression?} In this work, we take up the perspective of algorithmic complexity to explain this behavior. We hypothesize that the parameters of trained models have more structure and, hence, exhibit lower algorithmic complexity compared to the weights at (random) initialization. Furthermore, that model compression methods harness this reduced algorithmic complexity to compress models. Although an unconstrained parameterization of model weights, $\mathbf{w} \in \mathbb{R}^n$, can represent arbitrary weight assignments, the solutions found during training exhibit repeatability and structure, making them algorithmically simpler than a generic program. To this end, we formalize the Kolmogorov complexity of $\mathbf{w}$ by $\mathcal{K}(\mathbf{w})$. We introduce a constrained parameterization $\widehat{\mathbf{w}}$, that partitions parameters into blocks of size $s$, and restricts each block to be selected from a set of $k$ reusable motifs, specified by a reuse pattern (or mosaic). The resulting method, $\textit{Mosaic-of-Motifs}$ (MoMos), yields algorithmically simpler model parameterization compared to unconstrained models. Empirical evidence from multiple experiments shows that the algorithmic complexity of neural networks, measured using approximations to Kolmogorov complexity, can be reduced during training. This results in models that perform comparably with unconstrained models while being algorithmically simpler.

翻译：大规模深度学习模型非常适合进行压缩。剪枝、量化和知识蒸馏等方法已被用于实现模型参数数量的大幅减少，同时在多种架构和任务上仅带来微小的性能下降。这引出了一个核心问题：\emph{为什么深度神经网络适合压缩？} 在本工作中，我们采用算法复杂性的视角来解释这一现象。我们假设训练后模型的参数具有更多结构，因此与（随机）初始化时的权重相比，展现出更低的算法复杂性。此外，模型压缩方法正是利用这种降低的算法复杂性来压缩模型。尽管模型权重 $\mathbf{w} \in \mathbb{R}^n$ 的无约束参数化可以表示任意的权重分配，但在训练过程中找到的解展现出可重复性和结构，使得它们在算法上比通用程序更简单。为此，我们通过 $\mathcal{K}(\mathbf{w})$ 形式化 $\mathbf{w}$ 的柯尔莫哥洛夫复杂性。我们引入一种约束参数化 $\widehat{\mathbf{w}}$，它将参数划分为大小为 $s$ 的块，并限制每个块必须从一个包含 $k$ 个可重用基元的集合中选择，这些基元由一种重用模式（或称马赛克）指定。由此产生的方法，$\textit{马赛克-基元模式}$（MoMos），与无约束模型相比，产生了算法上更简单的模型参数化。来自多个实验的经验证据表明，使用柯尔莫哥洛夫复杂性的近似值测量的神经网络算法复杂性，可以在训练过程中降低。这导致模型在性能上与无约束模型相当的同时，算法上更简单。