With the development of deep neural network (DNN) enabled applications, achieving high hardware resource efficiency on diverse workloads is non-trivial in heterogeneous computing platforms. Prior works discuss dedicated architectures to achieve maximal resource efficiency. However, a mismatch between hardware and workloads always exists in various diverse workloads. Other works discuss overlay architecture that can dynamically switch dataflow for different workloads. However, these works are still limited by flexibility granularity and induce much resource inefficiency. To solve this problem, we propose a flexible composing architecture, FILCO, that can efficiently match diverse workloads to achieve the optimal storage and computation resource efficiency. FILCO can be reconfigured in real-time and flexibly composed into a unified or multiple independent accelerators. We also propose the FILCO framework, including an analytical model with a two-stage DSE that can achieve the optimal design point. We also evaluate the FILCO framework on the 7nm AMD Versal VCK190 board. Compared with prior works, our design can achieve 1.3x - 5x throughput and hardware efficiency on various diverse workloads.
翻译:随着深度神经网络应用的发展,在异构计算平台上针对多样化工作负载实现高硬件资源效率已成为一项重要挑战。现有工作提出了专用架构以实现最高资源效率,但硬件与工作负载之间始终存在失配问题。其他研究提出了覆盖架构,可针对不同工作负载动态切换数据流,但这些方案仍受限于灵活性粒度并导致资源效率损失。为解决该问题,本文提出柔性可组合架构FILCO,该架构能高效匹配多样化工作负载,实现最优存储与计算资源效率。FILCO具备实时重构能力,可灵活组合为统一加速器或多个独立加速器。我们同时提出包含两阶段设计空间探索分析模型的FILCO框架,该框架能够获得最优设计点。基于7nm AMD Versal VCK190开发板的评估表明,与现有方案相比,本设计在多样化工作负载上实现了1.3倍至5倍的吞吐率与硬件效率提升。