Sustaining high fidelity and high throughput of perception tasks over vision sensor streams on edge devices remains a formidable challenge, especially given the continuing increase in image sizes (e.g., generated by 4K cameras) and complexity of DNN models. One promising approach involves criticality-aware processing, where the computation is directed selectively to critical portions of individual image frames. We introduce MOSAIC, a novel system for such criticality-aware concurrent processing of multiple vision sensing streams that provides a multiplicative increase in the achievable throughput with negligible loss in perception fidelity. MOSAIC determines critical regions from images received from multiple vision sensors and spatially bin-packs these regions using a novel multi-scale Mosaic Across Scales (MoS) tiling strategy into a single canvas frame, sized such that the edge device can retain sufficiently high processing throughput. Experimental studies using benchmark datasets for two tasks, Automatic License Plate Recognition and Drone-based Pedestrian Detection, show that MOSAIC, executing on a Jetson TX2 edge device, can provide dramatic gains in the throughput vs. fidelity tradeoff. For instance, for drone-based pedestrian detection, for a batch size of 4, MOSAIC can pack input frames from 6 cameras to achieve (a) 4.75x higher throughput (23 FPS per camera, cumulatively 138FPS) with less than 1% accuracy loss, compared to a First Come First Serve (FCFS) processing paradigm.
翻译:在边缘设备上对视觉传感器流维持高保真度与高吞吐量的感知任务仍是严峻挑战,尤其是随着图像尺寸(如4K摄像头生成)和DNN模型复杂度的持续增长。一种有前景的方法涉及关键性感知处理,即将计算资源定向分配至单个图像帧的关键区域。我们提出MOSAIC——一种用于多路视觉感知流关键性并发处理的新型系统,该系统可在感知保真度损失可忽略的前提下实现吞吐量的倍增。MOSAIC从多台视觉传感器接收的图像中识别关键区域,并采用新型多尺度马赛克跨尺度(MoS)拼贴策略将这些区域空间化地装箱至单个画布帧中,其尺寸设定确保边缘设备能维持足够高的处理吞吐量。针对两项任务(自动车牌识别与无人机行人检测)的基准数据集实验表明,在Jetson TX2边缘设备上运行的MOSAIC可在吞吐量-保真度权衡中实现显著增益。例如,在无人机行人检测任务中,批处理大小为4时,相较先来先服务(FCFS)处理范式,MOSAIC可将6台摄像头的输入帧打包处理,实现(a)4.75倍吞吐量提升(每摄像头23FPS,累计138FPS),且准确率损失低于1%。