Real-time cognitive load assessment from eye-tracking signals could potentially enable adaptive human-centered-AI such as safety-critical applications such as driver vigilance monitoring or automated flight deck assistance, yet two challenges persist: handling frequent data missingness from blinks and tracking failures, and efficiently modeling long-range temporal dependencies. We propose MambaGaze, a framework that addresses these challenges through 1) XMD encoding, which augments raw features with observation masks and time-deltas to explicitly model data uncertainty, and 2) bidirectional Mamba-2, which captures temporal dependencies with linear computational complexity. Experiments on CLARE and CL-Drive datasets under leave-one-subject-out evaluation show that MambaGaze achieves 76.8% and 73.1% accuracy, respectively, outperforming CNN, Transformer, ResNet, and VGG baselines by 4-12 percentage points. Edge deployment benchmarks on NVIDIA Jetson platforms demonstrate real-time inference at 43-68 FPS with power consumption below 7.5W, confirming feasibility for wearable cognitive load monitoring.
翻译:从眼动追踪信号进行实时认知负荷评估,有望推动自适应人机交互AI的发展,例如驾驶员警觉性监控或自动化驾驶舱辅助等安全关键应用,但仍面临两大挑战:如何处理眨眼和追踪失败导致的频繁数据缺失,以及如何高效建模长时程时间依赖关系。本文提出MambaGaze框架,通过以下两项创新应对这些挑战:1)XMD编码,该编码将观测掩码与时间增量融入原始特征,以显式建模数据不确定性;2)双向Mamba-2,以线性计算复杂度捕获时间依赖性。在CLARE和CL-Drive数据集上进行的留一被试评估实验表明,MambaGaze分别达到76.8%和73.1%的准确率,优于CNN、Transformer、ResNet和VGG基线模型4-12个百分点。基于NVIDIA Jetson平台的边缘部署基准测试显示,其在功耗低于7.5W的条件下可实现43-68 FPS的实时推理,验证了用于可穿戴认知负荷监控的可行性。