HeteroEdge: Addressing Asymmetry in Heterogeneous Collaborative Autonomous Systems

Mohammad Saeid Anwar,Emon Dey,Maloy Kumar Devnath,Indrajeet Ghosh,Naima Khan,Jade Freeman,Timothy Gregory,Niranjan Suri,Kasthuri Jayaraja,Sreenivasan Ramasamy Ramamurthy,Nirmalya Roy

Gathering knowledge about surroundings and generating situational awareness for IoT devices is of utmost importance for systems developed for smart urban and uncontested environments. For example, a large-area surveillance system is typically equipped with multi-modal sensors such as cameras and LIDARs and is required to execute deep learning algorithms for action, face, behavior, and object recognition. However, these systems face power and memory constraints due to their ubiquitous nature, making it crucial to optimize data processing, deep learning algorithm input, and model inference communication. In this paper, we propose a self-adaptive optimization framework for a testbed comprising two Unmanned Ground Vehicles (UGVs) and two NVIDIA Jetson devices. This framework efficiently manages multiple tasks (storage, processing, computation, transmission, inference) on heterogeneous nodes concurrently. It involves compressing and masking input image frames, identifying similar frames, and profiling devices to obtain boundary conditions for optimization.. Finally, we propose and optimize a novel parameter split-ratio, which indicates the proportion of the data required to be offloaded to another device while considering the networking bandwidth, busy factor, memory (CPU, GPU, RAM), and power constraints of the devices in the testbed. Our evaluations captured while executing multiple tasks (e.g., PoseNet, SegNet, ImageNet, DetectNet, DepthNet) simultaneously, reveal that executing 70% (split-ratio=70%) of the data on the auxiliary node minimizes the offloading latency by approx. 33% (18.7 ms/image to 12.5 ms/image) and the total operation time by approx. 47% (69.32s to 36.43s) compared to the baseline configuration (executing on the primary node).

翻译：对于面向智慧城市与无争议环境开发的物联网系统而言，获取周围环境知识并生成态势感知能力至关重要。例如，大范围监控系统通常配备摄像头、激光雷达等多模态传感器，需执行动作识别、面部识别、行为分析及物体检测等深度学习算法。然而，这类系统因泛在部署特性面临算力与内存限制，亟需优化数据处理、深度学习算法输入及模型推理通信环节。本文针对包含两台无人地面车辆（UGV）与两台NVIDIA Jetson设备的测试平台，提出一种自适应优化框架。该框架可高效管理异构节点上的多项并发任务（存储、处理、计算、传输、推理），通过压缩与掩码处理输入图像帧、识别相似帧、对设备进行性能画像以获取优化边界条件。最终提出并优化新型参数——分片比（split-ratio），该参数表征在考虑网络带宽、忙闲因子、设备内存（CPU、GPU、RAM）及功耗约束条件下，需卸载至其他设备的数据比例。在同时执行PoseNet、SegNet、ImageNet、DetectNet、DepthNet等多项任务时的评估结果表明：与基准配置（在主节点执行）相比，将70%数据（分片比=70%）分配至辅助节点执行，可使卸载延迟降低约33%（从18.7毫秒/帧降至12.5毫秒/帧），总操作时间减少约47%（从69.32秒降至36.43秒）。