For driver observation frameworks, clean datasets collected in controlled simulated environments often serve as the initial training ground. Yet, when deployed under real driving conditions, such simulator-trained models quickly face the problem of distributional shifts brought about by changing illumination, car model, variations in subject appearances, sensor discrepancies, and other environmental alterations. This paper investigates the viability of transferring video-based driver observation models from simulation to real-world scenarios in autonomous vehicles, given the frequent use of simulation data in this domain due to safety issues. To achieve this, we record a dataset featuring actual autonomous driving conditions and involving seven participants engaged in highly distracting secondary activities. To enable direct SIM to REAL transfer, our dataset was designed in accordance with an existing large-scale simulator dataset used as the training source. We utilize the Inflated 3D ConvNet (I3D) model, a popular choice for driver observation, with Gradient-weighted Class Activation Mapping (Grad-CAM) for detailed analysis of model decision-making. Though the simulator-based model clearly surpasses the random baseline, its recognition quality diminishes, with average accuracy dropping from 85.7% to 46.6%. We also observe strong variations across different behavior classes. This underscores the challenges of model transferability, facilitating our research of more robust driver observation systems capable of dealing with real driving conditions.
翻译:对于驾驶行为观测框架,在受控仿真环境中采集的清洁数据集通常作为初始训练基础。然而,当部署于真实驾驶场景时,此类仿真训练模型会迅速面临由光照变化、车型差异、受试者外貌变化、传感器差异及其他环境变化带来的分布偏移问题。鉴于该领域因安全问题频繁使用仿真数据,本文研究了将基于视频的驾驶行为观测模型从仿真迁移至自动驾驶车辆真实场景的可行性。为此,我们录制了涵盖真实自动驾驶场景的数据集,其中包含七名参与者从事高度干扰性次级任务的行为记录。为实现直接的SIM到REAL迁移,本数据集依据现有的大型仿真数据集(作为训练源)进行设计。我们采用驾驶行为观测中常用的膨胀三维卷积神经网络(I3D)模型,并结合梯度加权类激活映射(Grad-CAM)对模型决策进行详细分析。尽管基于仿真环境的模型明显优于随机基线,但其识别质量有所下降,平均准确率从85.7%降至46.6%。同时,我们发现不同行为类别间存在显著差异。这凸显了模型迁移性面临的挑战,有助于推动我们研究更鲁棒的、能够应对真实驾驶场景的驾驶行为观测系统。