Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video

Real-time eyeblink detection in the wild can widely serve for fatigue detection, face anti-spoofing, emotion analysis, etc. The existing research efforts generally focus on single-person cases towards trimmed video. However, multi-person scenario within untrimmed videos is also important for practical applications, which has not been well concerned yet. To address this, we shed light on this research field for the first time with essential contributions on dataset, theory, and practices. In particular, a large-scale dataset termed MPEblink that involves 686 untrimmed videos with 8748 eyeblink events is proposed under multi-person conditions. The samples are captured from unconstrained films to reveal "in the wild" characteristics. Meanwhile, a real-time multi-person eyeblink detection method is also proposed. Being different from the existing counterparts, our proposition runs in a one-stage spatio-temporal way with end-to-end learning capacity. Specifically, it simultaneously addresses the sub-tasks of face detection, face tracking, and human instance-level eyeblink detection. This paradigm holds 2 main advantages: (1) eyeblink features can be facilitated via the face's global context (e.g., head pose and illumination condition) with joint optimization and interaction, and (2) addressing these sub-tasks in parallel instead of sequential manner can save time remarkably to meet the real-time running requirement. Experiments on MPEblink verify the essential challenges of real-time multi-person eyeblink detection in the wild for untrimmed video. Our method also outperforms existing approaches by large margins and with a high inference speed.

翻译：野外环境下的实时眨眼检测可广泛应用于疲劳检测、人脸防伪、情感分析等领域。现有研究通常聚焦于针对修剪视频的单人场景。然而，未修剪视频中的多人场景对实际应用同样重要，但尚未得到充分关注。为解决该问题，我们首次从数据集、理论及实践层面开辟该研究领域并作出关键贡献。具体而言，我们提出了一个名为MPEblink的大规模数据集，包含686段未修剪视频及8748个眨眼事件，所有样本均采集自无约束影片以体现"野外"特性。同时，本文提出了一种实时多人眨眼检测方法。与现有方法不同，该方法采用具有端到端学习能力的一阶段时空处理框架，可同步实现人脸检测、人脸跟踪及人体实例级眨眼检测子任务。该范式具有两大优势：(1) 通过联合优化与交互机制，可获得人脸全局上下文（如头部姿态与光照条件）对眨眼特征的增强；(2) 并行处理各子任务而非顺序执行，可显著节省运行时间以满足实时需求。在MPEblink数据集上的实验证实了面向未修剪视频的野外实时多人眨眼检测存在本质挑战。本方法在推理速度与检测精度上均大幅超越现有方法。