Deep convolutional neural network (CNN) training via iterative optimization has had incredible success in finding optimal parameters. However, modern CNN architectures often contain millions of parameters. Thus, any given model for a single architecture resides in a massive parameter space. Models with similar loss could have drastically different characteristics such as adversarial robustness, generalizability, and quantization robustness. For deep learning on the edge, quantization robustness is often crucial. Finding a model that is quantization-robust can sometimes require significant efforts. Recent works using Graph Hypernetworks (GHN) have shown remarkable performance predicting high-performant parameters of varying CNN architectures. Inspired by these successes, we wonder if the graph representations of GHN-2 can be leveraged to predict quantization-robust parameters as well, which we call GHN-Q. We conduct the first-ever study exploring the use of graph hypernetworks for predicting parameters of unseen quantized CNN architectures. We focus on a reduced CNN search space and find that GHN-Q can in fact predict quantization-robust parameters for various 8-bit quantized CNNs. Decent quantized accuracies are observed even with 4-bit quantization despite GHN-Q not being trained on it. Quantized finetuning of GHN-Q at lower bitwidths may bring further improvements and is currently being explored.
翻译:深度卷积神经网络通过迭代优化训练在寻找最优参数方面取得了巨大成功。然而,现代CNN架构通常包含数百万个参数,因此单一架构的任何特定模型都位于庞大的参数空间中。具有相似损失的模型可能表现出截然不同的特性,如对抗鲁棒性、泛化能力和量化鲁棒性。对于边缘设备上的深度学习而言,量化鲁棒性往往至关重要。寻找具有量化鲁棒性的模型有时需要付出巨大努力。近期利用图超网络的研究在预测不同CNN架构的高性能参数方面展现了卓越性能。受这些成功启发,我们探究能否利用GHN-2的图表示来同时预测量化鲁棒参数,并将其命名为GHN-Q。我们开展了首次系统性研究,探索利用图超网络预测未见量化CNN架构的参数。通过聚焦于精简后的CNN搜索空间,我们发现GHN-Q确实能够为多种8位量化CNN预测量化鲁棒参数。尽管GHN-Q未经过4位量化训练,但其预测结果仍展现出可观的量化精度。针对低位宽量化对GHN-Q进行微调可能带来进一步改进,目前正在探索中。