Chain-of-Thought (CoT) reasoning enhances the decision-making capabilities of vision-language-action models in autonomous driving, but its autoregressive nature introduces significant inference latency, making it impractical for real-time applications. To address this, we introduce FastDriveCoT, a novel parallel decoding method that accelerates template-structured CoT. Our approach decomposes the reasoning process into a dependency graph of distinct sub-tasks, such as identifying critical objects and summarizing traffic rules, some of which can be generated in parallel. By generating multiple independent reasoning steps concurrently within a single forward pass, we significantly reduce the number of sequential computations. Experiments demonstrate a 3-4$\times$ speedup in CoT generation and a substantial reduction in end-to-end latency across various model architectures, all while preserving the original downstream task improvements brought by incorporating CoT reasoning.
翻译:思维链(CoT)推理增强了自动驾驶中视觉-语言-行为模型的决策能力,但其自回归特性引入了显著的推理延迟,使其难以应用于实时场景。为解决这一问题,我们提出了FastDriveCoT,一种新颖的并行解码方法,用于加速模板结构化的CoT推理。我们的方法将推理过程分解为不同子任务(如识别关键物体和总结交通规则)的依赖图,其中部分子任务可以并行生成。通过在单次前向传播中同时生成多个独立的推理步骤,我们显著减少了顺序计算量。实验表明,在多种模型架构上,CoT生成速度提升了3-4倍,端到端延迟大幅降低,同时完全保留了引入CoT推理所带来的原始下游任务性能提升。