This paper presents a unified framework for codifying and automating optimization strategies to efficiently deploy deep neural networks (DNNs) on resource-constrained hardware, such as FPGAs, while maintaining high performance, accuracy, and resource efficiency. Deploying DNNs on such platforms involves addressing the significant challenge of balancing performance, resource usage (e.g., DSPs and LUTs), and inference accuracy, which often requires extensive manual effort and domain expertise. Our novel approach addresses two core key issues: (i)~encoding custom optimization strategies and (ii)~enabling cross-stage optimization search. In particular, our proposed framework seamlessly integrates programmatic DNN optimization techniques with high-level synthesis (HLS)-based metaprogramming, leveraging advanced design space exploration (DSE) strategies like Bayesian optimization to automate both top-down and bottom-up design flows. Hence, we reduce the need for manual intervention and domain expertise. In addition, the framework introduces customizable optimization, transformation, and control blocks to enhance DNN accelerator performance and resource efficiency. Experimental results demonstrate up to a 92\% DSP and 89\% LUT usage reduction for select networks, while preserving accuracy, along with a 15.6-fold reduction in optimization time compared to grid search. These results highlight the potential for automating the generation of resource-efficient DNN accelerator designs with minimum effort.
翻译:本文提出了一种统一框架,用于编码和自动化优化策略,以在资源受限的硬件(如FPGA)上高效部署深度神经网络(DNN),同时保持高性能、高精度和资源效率。在此类平台上部署DNN涉及平衡性能、资源使用(例如DSP和LUT)与推理精度的重大挑战,这通常需要大量人工努力和领域专业知识。我们的新颖方法解决了两个核心关键问题:(i)编码自定义优化策略,以及(ii)实现跨阶段优化搜索。具体而言,我们提出的框架将程序化DNN优化技术与基于高层次综合(HLS)的元编程无缝集成,并利用贝叶斯优化等先进设计空间探索(DSE)策略,以自动化自顶向下和自底向上的设计流程。因此,我们减少了对人工干预和领域专业知识的需求。此外,该框架引入了可定制的优化、转换和控制模块,以提升DNN加速器的性能和资源效率。实验结果表明,对于选定的网络,在保持精度的同时,DSP和LUT使用量分别最高减少了92%和89%,且优化时间相比网格搜索减少了15.6倍。这些结果凸显了以最小努力自动化生成资源高效DNN加速器设计的潜力。