Cyclists often encounter safety-critical situations in urban traffic, highlighting the need for assistive systems that support safe and informed decision-making. Recently, vision-language models (VLMs) have demonstrated strong performance on autonomous driving benchmarks, suggesting their potential for general traffic understanding and navigation-related reasoning. However, existing evaluations are predominantly vehicle-centric and fail to assess perception and reasoning from a cyclist-centric viewpoint. To address this gap, we introduce CyclingVQA, a diagnostic benchmark designed to probe perception, spatio-temporal understanding, and traffic-rule-to-lane reasoning from a cyclist's perspective. Evaluating 31+ recent VLMs spanning general-purpose, spatially enhanced, and autonomous-driving-specialized models, we find that current models demonstrate encouraging capabilities, while also revealing clear areas for improvement in cyclist-centric perception and reasoning, particularly in interpreting cyclist-specific traffic cues and associating signs with the correct navigational lanes. Notably, several driving-specialized models underperform strong generalist VLMs, indicating limited transfer from vehicle-centric training to cyclist-assistive scenarios. Finally, through systematic error analysis, we identify recurring failure modes to guide the development of more effective cyclist-assistive intelligent systems.
翻译:在城市交通中,骑行者常面临安全关键情境,凸显了对辅助系统以支持安全、明智决策的需求。近期,视觉语言模型在自动驾驶基准测试中展现出强劲性能,表明其在通用交通理解与导航相关推理方面具有潜力。然而,现有评估主要围绕车辆视角,未能从骑行者中心视角评估感知与推理能力。为填补这一空白,我们提出了CyclingVQA——一个诊断性基准测试,旨在从骑行者视角探究感知、时空理解及交通规则到车道关联的推理能力。通过对31个以上涵盖通用型、空间增强型和自动驾驶专用模型的最新视觉语言模型进行评估,我们发现当前模型展现出令人鼓舞的能力,同时也揭示了在骑行者中心感知与推理方面存在明显待改进之处,特别是在解析骑行者专属交通信号以及将标识与正确导航车道相关联方面。值得注意的是,若干驾驶专用模型的表现逊于强大的通用视觉语言模型,这表明从车辆中心训练到骑行辅助场景的迁移能力有限。最后,通过系统性误差分析,我们识别出反复出现的故障模式,以指导开发更有效的骑行辅助智能系统。