Weight-sharing quantization has emerged as a technique to reduce energy expenditure during inference in large neural networks by constraining their weights to a limited set of values. However, existing methods for weight-sharing quantization often make assumptions about the treatment of weights based on value alone that neglect the unique role weight position plays. This paper proposes a probabilistic framework based on Bayesian neural networks (BNNs) and a variational relaxation to identify which weights can be moved to which cluster centre and to what degree based on their individual position-specific learned uncertainty distributions. We introduce a new initialisation setting and a regularisation term which allow for the training of BNNs under complex dataset-model combinations. By leveraging the flexibility of weight values captured through a probability distribution, we enhance noise resilience and downstream compressibility. Our iterative clustering procedure demonstrates superior compressibility and higher accuracy compared to state-of-the-art methods on both ResNet models and the more complex transformer-based architectures. In particular, our method outperforms the state-of-the-art quantization method top-1 accuracy by 1.6% on ImageNet using DeiT-Tiny, with its 5 million+ weights now represented by only 296 unique values.
翻译:权重共享量化通过将大型神经网络的权重视为有限值集合,已成为减少推理能耗的技术。然而,现有权重共享量化方法通常仅基于权重值本身做出处理假设,忽略了权重位置所起到的独特作用。本文提出一种基于贝叶斯神经网络(BNN)与变分松弛的概率框架,根据权重位置特定的学习不确定性分布,识别哪些权重可移动至哪个聚类中心及其移动程度。我们引入新的初始化设置与正则化项,使得BNN在复杂数据集-模型组合下得以训练。通过利用概率分布捕获的权重值灵活性,我们增强了噪声鲁棒性与下游可压缩性。所提出的迭代聚类过程在ResNet模型及更复杂的基于Transformer架构上均展现出优于现有方法的压缩性能和更高精度。特别地,我们的方法在ImageNet数据集上使用DeiT-Tiny模型时,将当前最优量化方法的top-1准确率提升了1.6%,其超过500万个权重现仅由296个唯一值表示。