Resource-constrained Edge Devices (EDs), e.g., IoT sensors and microcontroller units, are expected to make intelligent decisions using Deep Learning (DL) inference at the edge of the network. Toward this end, there is a significant research effort in developing tinyML models - Deep Learning (DL) models with reduced computation and memory storage requirements - that can be embedded on these devices. However, tinyML models have lower inference accuracy. On a different front, DNN partitioning and inference offloading techniques were studied for distributed DL inference between EDs and Edge Servers (ESs). In this paper, we explore Hierarchical Inference (HI), a novel approach proposed by Vishnu et al. 2023, arXiv:2304.00891v1 , for performing distributed DL inference at the edge. Under HI, for each data sample, an ED first uses a local algorithm (e.g., a tinyML model) for inference. Depending on the application, if the inference provided by the local algorithm is incorrect or further assistance is required from large DL models on edge or cloud, only then the ED offloads the data sample. At the outset, HI seems infeasible as the ED, in general, cannot know if the local inference is sufficient or not. Nevertheless, we present the feasibility of implementing HI for machine fault detection and image classification applications. We demonstrate its benefits using quantitative analysis and argue that using HI will result in low latency, bandwidth savings, and energy savings in edge AI systems.
翻译:资源受限的边缘设备(EDs),例如物联网传感器和微控制器单元,预期在网络边缘利用深度学习(DL)推理做出智能决策。为此,学术界在开发tinyML模型——即计算量和内存存储需求降低的深度学习模型——方面投入了大量研究,这些模型可嵌入此类设备。然而,tinyML模型的推理精度较低。另一方面,分布式DL推理中,针对边缘设备(EDs)与边缘服务器(ESs)之间的DNN分区及推理卸载技术已有研究。本文探索了Vishnu等人于2023年(arXiv:2304.00891v1)提出的分层推理(HI)这一新颖方法,用于实现边缘分布式DL推理。在HI框架下,对于每个数据样本,边缘设备首先采用局部算法(如tinyML模型)进行推理。根据应用需求,若局部推理结果错误,或需边缘或云端大型DL模型提供进一步协助,边缘设备才会卸载该数据样本。初看之下,HI似乎不可行,因为边缘设备通常无法判断局部推理是否充足。尽管如此,我们论证了在机器故障检测与图像分类应用中实施HI的可行性,并通过定量分析展示了其优势,指出采用HI可实现边缘AI系统的低延迟、带宽节省及能耗降低。