With the great success of Deep Neural Networks (DNN), the design of efficient hardware accelerators has triggered wide interest in the research community. Existing research explores two architectural strategies: sequential layer execution and layer-wise pipelining. While the former supports a wider range of models, the latter is favoured for its enhanced customization and efficiency. A challenge for the layer-wise pipelining architecture is its substantial demand for the on-chip memory for weights storage, impeding the deployment of large-scale networks on resource-constrained devices. This paper introduces AutoWS, a pioneering memory management methodology that exploits both on-chip and off-chip memory to optimize weight storage within a layer-wise pipelining architecture, taking advantage of its static schedule. Through a comprehensive investigation on both the hardware design and the Design Space Exploration, our methodology is fully automated and enables the deployment of large-scale DNN models on resource-constrained devices, which was not possible in existing works that target layer-wise pipelining architectures. AutoWS is open-source: https://github.com/Yu-Zhewen/AutoWS
翻译:随着深度神经网络(DNN)的巨大成功,高效硬件加速器的设计引发了学术界的广泛关注。现有研究探索了两种架构策略:顺序逐层执行与逐层流水线。前者支持更广泛的模型,后者则因其强化的定制化与效率而备受青睐。逐层流水线架构面临的一个挑战是权重存储对片上存储的极大需求,这阻碍了大规模网络在资源受限设备上的部署。本文介绍了AutoWS,一种开创性的内存管理方法,利用片上与片外存储优化逐层流水线架构中的权重存储,并充分利用其静态调度优势。通过对硬件设计与设计空间探索的全面研究,我们的方法实现了完全自动化,从而能够在资源受限设备上部署大规模DNN模型,这是现有针对逐层流水线架构的研究无法实现的。AutoWS为开源项目:https://github.com/Yu-Zhewen/AutoWS