Binarization is a powerful compression technique for neural networks, significantly reducing FLOPs, but often results in a significant drop in model performance. To address this issue, partial binarization techniques have been developed, but a systematic approach to mixing binary and full-precision parameters in a single network is still lacking. In this paper, we propose a controlled approach to partial binarization, creating a budgeted binary neural network (B2NN) with our MixBin strategy. This method optimizes the mixing of binary and full-precision components, allowing for explicit selection of the fraction of the network to remain binary. Our experiments show that B2NNs created using MixBin outperform those from random or iterative searches and state-of-the-art layer selection methods by up to 3% on the ImageNet-1K dataset. We also show that B2NNs outperform the structured pruning baseline by approximately 23% at the extreme FLOP budget of 15%, and perform well in object tracking, with up to a 12.4% relative improvement over other baselines. Additionally, we demonstrate that B2NNs developed by MixBin can be transferred across datasets, with some cases showing improved performance over directly applying MixBin on the downstream data.
翻译:二值化是一种强大的神经网络压缩技术,能显著降低FLOPs,但通常导致模型性能大幅下降。为解决此问题,研究人员发展了部分二值化技术,但仍缺乏在单网络中系统混合二值与全精度参数的方法。本文提出一种受控的部分二值化方法,通过我们的MixBin策略构建预算二值神经网络(B2NN)。该方法优化二值与全精度组件的混合,允许显式选择网络中保持二值参数的比例。实验表明,采用MixBin创建的B2NN在ImageNet-1K数据集上比随机搜索、迭代搜索及现有最优层选择方法性能提升高达3%。我们还证明,在极端FLOP预算(15%)下,B2NN比结构化剪枝基线性能提升约23%,并在目标跟踪任务中表现优异,相对其他基线最高提升12.4%。此外,我们展示了MixBin开发的B2NN可跨数据集迁移,某些情况下在下游数据上直接使用MixBin甚至能获得更优性能。