Enhancing Predictability of Multi-Tenant DNN Inference for Autonomous Vehicles' Perception

Autonomous vehicles (AVs) rely on sensors and deep neural networks (DNNs) to perceive their surrounding environment and make maneuver decisions in real time. However, achieving real-time DNN inference in the AV's perception pipeline is challenging due to the large gap between the computation requirement and the AV's limited resources. Most, if not all, of existing studies focus on optimizing the DNN inference time to achieve faster perception by compressing the DNN model with pruning and quantization. In contrast, we present a Predictable Perception system with DNNs (PP-DNN) that reduce the amount of image data to be processed while maintaining the same level of accuracy for multi-tenant DNNs by dynamically selecting critical frames and regions of interest (ROIs). PP-DNN is based on our key insight that critical frames and ROIs for AVs vary with the AV's surrounding environment. However, it is challenging to identify and use critical frames and ROIs in multi-tenant DNNs for predictable inference. Given image-frame streams, PP-DNN leverages an ROI generator to identify critical frames and ROIs based on the similarities of consecutive frames and traffic scenarios. PP-DNN then leverages a FLOPs predictor to predict multiply-accumulate operations (MACs) from the dynamic critical frames and ROIs. The ROI scheduler coordinates the processing of critical frames and ROIs with multiple DNN models. Finally, we design a detection predictor for the perception of non-critical frames. We have implemented PP-DNN in an ROS-based AV pipeline and evaluated it with the BDD100K and the nuScenes dataset. PP-DNN is observed to significantly enhance perception predictability, increasing the number of fusion frames by up to 7.3x, reducing the fusion delay by >2.6x and fusion-delay variations by >2.3x, improving detection completeness by 75.4% and the cost-effectiveness by up to 98% over the baseline.

翻译：自动驾驶车辆依赖传感器和深度神经网络实时感知周围环境并做出机动决策。然而，由于计算需求与车辆有限资源之间存在巨大差距，在自动驾驶感知流程中实现实时深度神经网络推理具有挑战性。现有研究大多（若非全部）专注于通过剪枝和量化压缩深度神经网络模型来优化推理时间，以实现更快的感知。相比之下，我们提出了一种可预测的深度神经网络感知系统，该系统通过动态选择关键帧和感兴趣区域，在保持多租户深度神经网络相同精度水平的同时，减少待处理的图像数据量。该系统的核心洞见在于：自动驾驶车辆的关键帧和感兴趣区域会随周围环境动态变化。然而，在多租户深度神经网络中识别并利用关键帧和感兴趣区域以实现可预测推理仍面临挑战。针对图像帧流，该系统利用感兴趣区域生成器，基于连续帧的相似性和交通场景识别关键帧与感兴趣区域。随后通过浮点运算预测器，根据动态关键帧和感兴趣区域预测乘积累加运算量。感兴趣区域调度器协调多个深度神经网络模型对关键帧和感兴趣区域的并行处理。最后，我们设计了检测预测器用于非关键帧的感知任务。我们在基于机器人操作系统的自动驾驶流程中实现了该系统，并使用BDD100K和nuScenes数据集进行评估。实验表明，相较于基线方法，该系统显著提升了感知可预测性：融合帧数量最高提升7.3倍，融合延迟降低超过2.6倍，融合延迟波动减少超过2.3倍，检测完整度提高75.4%，成本效益最高提升98%。