Distributed systems can be found in various applications, e.g., in robotics or autonomous driving, to achieve higher flexibility and robustness. Thereby, data flow centric applications such as Deep Neural Network (DNN) inference benefit from partitioning the workload over multiple compute nodes in terms of performance and energy-efficiency. However, mapping large models on distributed embedded systems is a complex task, due to low latency and high throughput requirements combined with strict energy and memory constraints. In this paper, we present a novel approach for hardware-aware layer scheduling of DNN inference in distributed embedded systems. Therefore, our proposed framework uses a graph-based algorithm to automatically find beneficial partitioning points in a given DNN. Each of these is evaluated based on several essential system metrics such as accuracy and memory utilization, while considering the respective system constraints. We demonstrate our approach in terms of the impact of inference partitioning on various performance metrics of six different DNNs. As an example, we can achieve a 47.5 % throughput increase for EfficientNet-B0 inference partitioned onto two platforms while observing high energy-efficiency.
翻译:分布式系统在机器人学或自动驾驶等多种应用中广泛存在,以实现更高的灵活性与鲁棒性。在此背景下,以数据流为中心的应用(如深度神经网络推理)通过将工作负载划分至多个计算节点,能在性能与能效方面获得显著收益。然而,在分布式嵌入式系统上部署大型模型是一项复杂任务,这源于低延迟与高吞吐量的需求,以及严格的能耗与内存限制。本文提出一种面向分布式嵌入式系统中DNN推理的硬件感知型层调度新方法。为此,我们提出的框架采用基于图的算法,自动在给定DNN中寻找有利的划分点。每个划分点均依据准确率、内存利用率等多项关键系统指标进行评估,同时兼顾相应的系统约束。我们通过分析推理划分对六种不同DNN各项性能指标的影响来验证所提方法。以EfficientNet-B0为例,将其推理任务划分至两个平台可实现47.5%的吞吐量提升,同时保持高能效特性。