The need to efficiently execute different Deep Neural Networks (DNNs) on the same computing platform, coupled with the requirement for easy scalability, makes Multi-Chip Module (MCM)-based accelerators a preferred design choice. Such an accelerator brings together heterogeneous sub-accelerators in the form of chiplets, interconnected by a Network-on-Package (NoP). This paper addresses the challenge of selecting the most suitable sub-accelerators, configuring them, determining their optimal placement in the NoP, and mapping the layers of a predetermined set of DNNs spatially and temporally. The objective is to minimise execution time and energy consumption during parallel execution while also minimising the overall cost, specifically the silicon area, of the accelerator. This paper presents MOHaM, a framework for multi-objective hardware-mapping co-optimisation for multi-DNN workloads on chiplet-based accelerators. MOHaM exploits a multi-objective evolutionary algorithm that has been specialised for the given problem by incorporating several customised genetic operators. MOHaM is evaluated against state-of-the-art Design Space Exploration (DSE) frameworks on different multi-DNN workload scenarios. The solutions discovered by MOHaM are Pareto optimal compared to those by the state-of-the-art. Specifically, MOHaM-generated accelerator designs can reduce latency by up to $96\%$ and energy by up to $96.12\%$.
翻译:在同一计算平台上高效执行不同深度神经网络的需求,加之对易于扩展性的要求,使得基于多芯片模块的加速器成为一种优选的设计方案。此类加速器以芯粒形式集成异构子加速器,并通过封装内网络互连。本文旨在解决以下挑战:选择最合适的子加速器、配置其参数、确定其在封装内网络中的最优布局,并对预设深度神经网络集合的各层进行时空映射。优化目标是在并行执行过程中最小化运行时间与能耗,同时最小化加速器的总体成本(具体指硅面积)。本文提出MOHaM框架,用于实现基于芯粒架构加速器的多深度神经网络工作负载的多目标硬件映射协同优化。MOHaM采用一种针对该问题专门设计的多目标进化算法,该算法融合了多种定制化的遗传算子。我们在不同的多深度神经网络工作负载场景下,将MOHaM与最先进的设计空间探索框架进行对比评估。相较于现有最优方案,MOHaM所发现的解均具有帕累托最优性。具体而言,MOHaM生成的加速器设计可降低高达$96\%$的延迟和高达$96.12\%$的能耗。