Planning under uncertainty for real-world robotics tasks, such as autonomous driving, requires reasoning in enormous high-dimensional belief spaces, rendering the problem computationally intensive. While parallelization offers scalability, existing hybrid CPU-GPU solvers face critical bottlenecks due to host-device synchronization latency and branch divergence on SIMT architectures, limiting their utility for real-time planning and hindering real-robot deployment. We present Vec-QMDP, a CPU-native parallel planner that aligns POMDP search with modern CPUs' SIMD architecture, achieving $227\times$--$1073\times$ speedup over state-of-the-art serial planners. Vec-QMDP adopts a Data-Oriented Design (DOD), refactoring scattered, pointer-based data structures into contiguous, cache-efficient memory layouts. We further introduce a hierarchical parallelism scheme: distributing sub-trees across independent CPU cores and SIMD lanes, enabling fully vectorized tree expansion and collision checking. Efficiency is maximized with the help of UCB load balancing across trees and a vectorized STR-tree for coarse-level collision checking. Evaluated on large-scale autonomous driving benchmarks, Vec-QMDP achieves state-of-the-art planning performance with millisecond-level latency, establishing CPUs as a high-performance computing platform for large-scale planning under uncertainty.
翻译:在现实世界机器人任务(如自动驾驶)中进行不确定性规划,需要在庞大高维信念空间中进行推理,导致该问题计算密集。虽然并行化提供了可扩展性,但现有的CPU-GPU混合求解器因主机-设备同步延迟以及SIMT架构上的分支发散而面临关键瓶颈,限制了其在实时规划中的实用性,并阻碍了真实机器人部署。本文提出Vec-QMDP,一种原生运行于CPU的并行规划器,它将POMDP搜索与现代CPU的SIMD架构对齐,相比最先进的串行规划器实现了$227\times$--$1073\times$的加速。Vec-QMDP采用数据导向设计(DOD),将分散的、基于指针的数据结构重构为连续的、缓存高效的内存布局。我们进一步引入一种分层并行方案:将子树分布到独立的CPU核心和SIMD通道上,实现了完全向量化的树扩展与碰撞检测。通过跨树的UCB负载均衡以及用于粗粒度碰撞检测的向量化STR-tree,效率得以最大化。在大规模自动驾驶基准测试中,Vec-QMDP以毫秒级延迟实现了最先进的规划性能,确立了CPU作为大规模不确定性规划的高性能计算平台的地位。