DNN inference can be accelerated by distributing the workload among a cluster of collaborative edge nodes. Heterogeneity among edge devices and accuracy-performance trade-offs of DNN models present a complex exploration space while catering to the inference performance requirements. In this work, we propose adaptive workload distribution for DNN inference, jointly considering node-level heterogeneity of edge devices, and application-specific accuracy and performance requirements. Our proposed approach combinatorially optimizes heterogeneity-aware workload partitioning and dynamic accuracy configuration of DNN models to ensure performance and accuracy guarantees. We tested our approach on an edge cluster of Odroid XU4, Raspberry Pi4, and Jetson Nano boards and achieved an average gain of 41.52% in performance and 5.2% in output accuracy as compared to state-of-the-art workload distribution strategies.
翻译:DNN推理可通过将工作负载分布到协作边缘节点集群来加速。边缘设备间的异构性及DNN模型的精度-性能权衡,在满足推理性能需求的同时,呈现了一个复杂的探索空间。本文针对DNN推理提出自适应工作负载分配方法,联合考虑了边缘设备的节点级异构性以及特定应用的精度与性能需求。所提方法通过组合优化方式实现异构感知的工作负载划分与DNN模型动态精度配置,从而确保性能与精度保障。我们在包含Odroid XU4、Raspberry Pi4和Jetson Nano开发板的边缘集群上进行了测试,相较于现有最优的工作负载分配策略,性能平均提升41.52%,输出精度平均提升5.2%。