Visual and Cognitive Demands of a Large Language Model-Powered In-vehicle Conversational Agent

Driver distraction remains a leading contributor to motor vehicle crashes, necessitating rigorous evaluation of new in-vehicle technologies. This study assessed the visual and cognitive demands associated with an advanced Large Language Model (LLM) conversational agent (Gemini Live) during on-road driving, comparing it against handsfree phone calls, visual turn-by-turn guidance (low load baseline), and the Operation Span (OSPAN) task (high load anchor). Thirty-two licensed drivers completed five secondary tasks while visual and cognitive demands were measured using the Detection Response Task (DRT) for cognitive load, eye-tracking for visual attention, and subjective workload ratings. Results indicated that Gemini Live interactions (both single-turn and multi-turn) and hands-free phone calls shared similar levels of cognitive load, between that of visual turn-by-turn guidance and OSPAN. Exploratory analysis showed that cognitive load remained stable across extended multi-turn conversations. All tasks maintained mean glance durations well below the well-established 2-second safety threshold, confirming low visual demand. Furthermore, drivers consistently dedicated longer glances to the roadway between brief off-road glances toward the device during task completion, particularly during voice-based interactions, rendering longer total-eyes-off-road time findings less consequential. Subjective ratings mirrored objective data, with participants reporting low effort, demands, and perceived distraction for Gemini Live. These findings demonstrate that advanced LLM conversational agents, when implemented via voice interfaces, impose cognitive and visual demands comparable to established, low-risk hands-free benchmarks, supporting their safe deployment in the driving environment.

翻译：驾驶员分心仍是机动车碰撞事故的主要诱因，这要求对新型车载技术进行严格评估。本研究评估了在道路驾驶过程中，与先进大型语言模型（LLM）对话代理（Gemini Live）相关的视觉及认知需求，并将其与免提电话通话、视觉逐向导航（低负荷基线）以及操作广度（OSPAN）任务（高负荷锚点）进行了比较。32名持照驾驶员在执行五项次要任务时，其视觉与认知需求通过检测响应任务（DRT）测量认知负荷、眼动追踪测量视觉注意力以及主观工作量评分进行量化。结果表明，Gemini Live交互（包括单轮和多轮对话）与免提电话通话具有相似的认知负荷水平，介于视觉逐向导航与OSPAN任务之间。探索性分析显示，在延长的多轮对话中认知负荷保持稳定。所有任务的平均注视持续时间均远低于公认的2秒安全阈值，证实了其视觉需求较低。此外，在任务执行期间，驾驶员尤其在基于语音的交互过程中，持续将更长的注视时间分配给道路，仅在设备方向进行短暂的非道路注视，这使得总视线离开道路时间的发现意义减弱。主观评分与客观数据一致，参与者报告Gemini Live所需的努力、需求及感知分心程度均较低。这些发现表明，先进的LLM对话代理通过语音界面实现时，其施加的认知和视觉需求与已确立的低风险免提基准相当，支持其在驾驶环境中的安全部署。