CARE Drive A Framework for Evaluating Reason-Responsiveness of Vision Language Models in Automated Driving

Foundation models, including vision language models, are increasingly used in automated driving to interpret scenes, recommend actions, and generate natural language explanations. However, existing evaluation methods primarily assess outcome based performance, such as safety and trajectory accuracy, without determining whether model decisions reflect human relevant considerations. As a result, it remains unclear whether explanations produced by such models correspond to genuine reason responsive decision making or merely post hoc rationalizations. This limitation is especially significant in safety critical domains because it can create false confidence. To address this gap, we propose CARE Drive, Context Aware Reasons Evaluation for Driving, a model agnostic framework for evaluating reason responsiveness in vision language models applied to automated driving. CARE Drive compares baseline and reason augmented model decisions under controlled contextual variation to assess whether human reasons causally influence decision behavior. The framework employs a two stage evaluation process. Prompt calibration ensures stable outputs. Systematic contextual perturbation then measures decision sensitivity to human reasons such as safety margins, social pressure, and efficiency constraints. We demonstrate CARE Drive in a cyclist overtaking scenario involving competing normative considerations. Results show that explicit human reasons significantly influence model decisions, improving alignment with expert recommended behavior. However, responsiveness varies across contextual factors, indicating uneven sensitivity to different types of reasons. These findings provide empirical evidence that reason responsiveness in foundation models can be systematically evaluated without modifying model parameters.

翻译：基础模型（包括视觉语言模型）在自动驾驶中正被日益广泛地用于场景理解、行动建议和自然语言解释生成。然而，现有评估方法主要关注基于结果的性能（如安全性和轨迹精度），而未确定模型决策是否反映了与人类相关的考量因素。因此，此类模型生成的解释究竟对应于真正的推理响应式决策，抑或仅为事后合理化，目前尚不明确。这一局限在安全关键领域尤为重要，因为它可能产生虚假信心。为弥补这一不足，我们提出了CARE Drive（面向驾驶的上下文感知推理评估），这是一个模型无关的框架，用于评估应用于自动驾驶的视觉语言模型的推理响应性。CARE Drive通过比较基准模型与推理增强模型在受控上下文变化下的决策，以评估人类推理是否对决策行为产生因果性影响。该框架采用两阶段评估流程：提示校准确保输出稳定性；随后通过系统性上下文扰动来测量决策对人类推理（如安全裕度、社会压力和效率约束）的敏感性。我们在涉及竞争性规范考量的自行车超车场景中演示了CARE Drive的应用。结果表明，明确的人类推理能显著影响模型决策，提升其与专家推荐行为的一致性。然而，响应性随上下文因素而变化，表明模型对不同类型推理的敏感性并不均衡。这些发现提供了实证证据，表明无需修改模型参数即可系统性地评估基础模型的推理响应性。