3D object detection has a pivotal role in a wide range of applications, most notably autonomous driving and robotics. These applications are commonly deployed on edge devices to promptly interact with the environment, and often require near real-time response. With limited computation power, it is challenging to execute 3D detection on the edge using highly complex neural networks. Common approaches such as offloading to the cloud brings latency overheads due to the large amount of 3D point cloud data during transmission. To resolve the tension between wimpy edge devices and compute-intensive inference workloads, we explore the possibility of transforming fast 2D detection results to extrapolate 3D bounding boxes. To this end, we present Moby, a novel system that demonstrates the feasibility and potential of our approach. Our main contributions are two-fold: First, we design a 2D-to-3D transformation pipeline that takes as input the point cloud data from LiDAR and 2D bounding boxes from camera that are captured at exactly the same time, and generate 3D bounding boxes efficiently and accurately based on detection results of the previous frames without running 3D detectors. Second, we design a frame offloading scheduler that dynamically launches a 3D detection when the error of 2D-to-3D transformation accumulates to a certain level, so the subsequent transformations can draw upon the latest 3D detection results with better accuracy. Extensive evaluation on NVIDIA Jetson TX2 with the autonomous driving dataset KITTI and real-world 4G/LTE traces shows that, Moby reduces the end-to-end latency by up to 91.9% with mild accuracy drop compared to baselines. Further, Moby shows excellent energy efficiency by saving power consumption and memory footprint up to 75.7% and 48.1%, respectively.
翻译:三维目标检测在自动驾驶和机器人等广泛应用中具有关键作用。这些应用通常部署在边缘设备上以实现对环境快速交互,且常需近实时响应。受限于有限的计算能力,在边缘端使用高度复杂的神经网络执行三维检测极具挑战性。将任务卸载到云端等常见方案会因传输过程中海量三维点云数据而产生延迟开销。为解决弱边缘设备与计算密集型推理工作负载之间的矛盾,我们探索了将快速二维检测结果转化为三维边界框推断的可能性。为此,本文提出Moby系统,验证了该方法的可行性与潜力。主要贡献包括:第一,设计了一种二维到三维转换流水线,该流水线以激光雷达同时刻采集的点云数据和摄像头获取的二维边界框为输入,基于先前帧的检测结果高效准确生成三维边界框而无需运行三维检测器;第二,设计了一种帧卸载调度器,当二维到三维转换的误差累积到特定阈值时动态启动三维检测,使后续转换能基于最新三维检测结果获得更高精度。在配备NVIDIA Jetson TX2平台、基于自动驾驶数据集KITTI及真实4G/LTE网络轨迹的全面评估中,Moby相较基线方法将端到端延迟降低达91.9%,且精度下降轻微。此外,Moby展现出卓越能效,功耗与内存占用分别最高降低75.7%和48.1%。