While maximizing deep neural networks' (DNNs') acceleration efficiency requires a joint search/design of three different yet highly coupled aspects, including the networks, bitwidths, and accelerators, the challenges associated with such a joint search have not yet been fully understood and addressed. The key challenges include (1) the dilemma of whether to explode the memory consumption due to the huge joint space or achieve sub-optimal designs, (2) the discrete nature of the accelerator design space that is coupled yet different from that of the networks and bitwidths, and (3) the chicken and egg problem associated with network-accelerator co-search, i.e., co-search requires operation-wise hardware cost, which is lacking during search as the optimal accelerator depending on the whole network is still unknown during search. To tackle these daunting challenges towards optimal and fast development of DNN accelerators, we propose a framework dubbed Auto-NBA to enable jointly searching for the Networks, Bitwidths, and Accelerators, by efficiently localizing the optimal design within the huge joint design space for each target dataset and acceleration specification. Our Auto-NBA integrates a heterogeneous sampling strategy to achieve unbiased search with constant memory consumption, and a novel joint-search pipeline equipped with a generic differentiable accelerator search engine. Extensive experiments and ablation studies validate that both Auto-NBA generated networks and accelerators consistently outperform state-of-the-art designs (including co-search/exploration techniques, hardware-aware NAS methods, and DNN accelerators), in terms of search time, task accuracy, and accelerator efficiency. Our codes are available at: https://github.com/RICE-EIC/Auto-NBA.
翻译:最大化深度神经网络加速效率需要联合搜索/设计三个不同但高度耦合的方面,包括网络架构、位宽和加速器。然而,这种联合搜索所面临的挑战尚未被充分理解和解决。关键挑战包括:(1)在因巨大联合空间导致内存消耗爆炸与获得次优设计之间的两难困境;(2)加速器设计空间的离散特性——该空间与网络和位宽空间耦合但性质不同;(3)网络-加速器协同搜索中的"鸡与蛋"问题,即协同搜索需要逐操作级的硬件成本,而由于最优加速器取决于整个网络,在搜索过程中该成本尚不可知。为应对这些阻碍DNN加速器最优快速开发的艰巨挑战,我们提出名为Auto-NBA的框架,通过高效定位每个目标数据集和加速规范所对应巨大联合空间中的最优设计,实现网络、位宽与加速器的联合搜索。Auto-NBA集成异构采样策略,以恒定内存消耗实现无偏搜索,并配备基于可微分加速器搜索引擎的新型联合搜索流水线。大量实验与消融研究验证了Auto-NBA生成的网络和加速器在搜索时间、任务精度和加速器效率方面均持续超越最先进设计(包括协同搜索/探索技术、硬件感知NAS方法和DNN加速器)。我们的代码开源在:https://github.com/RICE-EIC/Auto-NBA。