Autonomous vehicles (AVs) are envisioned to revolutionize our life by providing safe, relaxing, and convenient ground transportation. The computing systems in such vehicles are required to interpret various sensor data and generate responses to the environment in a timely manner to ensure driving safety. However, such timing-related safety requirements are largely unexplored in prior works. In this paper, we conduct a systematic study to understand the timing requirements of AV systems. We focus on investigating and mitigating the sources of tail latency in Level-4 AV computing systems. We observe that the performance of AV algorithms is not uniformly distributed -- instead, the latency is susceptible to vehicle environment fluctuations, such as traffic density. This contributes to burst computation and memory access in response to the traffic, and further leads to tail latency in the system. Furthermore, we observe that tail latency also comes from a mismatch between the pre-configured AV computation pipeline and the dynamic latency requirements in real-world driving scenarios. Based on these observations, we propose a set of system designs to mitigate AV tail latency. We demonstrate our design on widely-used industrial Level-4 AV systems, Baidu Apollo and Autoware. The evaluation shows that our design achieves 1.65 X improvement over the worst-case latency and 1.3 X over the average latency, and avoids 93% of accidents on Apollo.
翻译:自动驾驶汽车(AV)有望通过提供安全、舒适且便捷的地面交通彻底改变我们的生活。此类车辆中的计算系统需及时解读各类传感器数据并生成环境响应,以确保行驶安全。然而,现有研究对这类与时间相关的安全需求探索尚不充分。本文首次系统性地研究了AV系统的时间需求,重点探究并缓解L4级AV计算系统中的尾部延迟来源。我们观察到AV算法的性能并非均匀分布——延迟易受车辆环境波动(如交通密度)影响。这种波动导致计算与内存访问随交通状况产生突发负载,进而引发系统尾部延迟。此外,我们还发现尾部延迟源于预配置的AV计算流水线与真实驾驶场景中动态延迟需求之间的失配。基于上述发现,我们提出了一组系统设计方案以缓解AV尾部延迟。在广泛使用的工业级L4级AV系统(百度Apollo和Autoware)上的实验表明:本设计在最差情况延迟上提升1.65倍,平均延迟上提升1.3倍,并在Apollo系统中避免93%的事故。