Split learning (SL) addresses the limitation of running deep learning inference directly on low-power edge/IoT nodes, in which it executes part of the inference process on the sensor and offloading the remainder to a companion device. Despite its promise, the inference latency of SL on constrained hardware under realistic low-power wireless protocols remains unexplored. This paper presents the first experimental latency benchmark of TinyML-based SL on ESP32-S3 boards, comparing four wireless communication protocol solutions (UDP, TCP, ESP-NOW, BLE). We also analyze the impact of the choice of different split points across different models (MobileNet-V2 and ResNet50) in terms of communication and computation overhead as a way to minimize the end-to-end inference latency. We propose a Beam Search-based algorithm for split point optimization that minimizes end-to-end latency, and compare it with other methods, including Greedy Search, First-Fit, Random-Fit, and Brute Force. ESP-NOW achieves the best RTT (3.6 s) and serves as the base protocol for the algorithm, which delivers near-optimal latency with processing time of 0.1 s for 5 devices.
翻译:分割学习解决了在低功耗边缘/IoT节点上直接运行深度学习推理的局限性,该方法将部分推理过程在传感器上执行,其余部分卸载至协同设备。尽管前景广阔,但在实际低功耗无线协议约束下,分割学习在受限硬件上的推理延迟仍未被探索。本文首次在ESP32-S3开发板上对基于TinyML的分割学习进行实验性延迟基准测试,对比四种无线通信协议方案(UDP、TCP、ESP-NOW、BLE)。我们进一步分析了不同模型(MobileNet-V2和ResNet50)中不同分割点选择对通信和计算开销的影响,旨在最小化端到端推理延迟。提出一种基于波束搜索的分割点优化算法以最小化端到端延迟,并将其与贪心搜索、首次适应、随机适应及暴力搜索等方法进行对比。ESP-NOW实现了最佳RTT(3.6秒),并作为该算法的基础协议,在5个设备场景下以0.1秒处理时间即可获得接近最优的延迟。