Eye blinking detection in the wild plays an essential role in deception detection, driving fatigue detection, etc. Despite the fact that numerous attempts have already been made, the majority of them have encountered difficulties, such as the derived eye images having different resolutions as the distance between the face and the camera changes; or the requirement of a lightweight detection model to obtain a short inference time in order to perform in real-time. In this research, two problems are addressed: how the eye blinking detection model can learn efficiently from different resolutions of eye pictures in diverse conditions; and how to reduce the size of the detection model for faster inference time. We propose to utilize upsampling and downsampling the input eye images to the same resolution as one potential solution for the first problem, then find out which interpolation method can result in the highest performance of the detection model. For the second problem, although a recent spatiotemporal convolutional neural network used for eye blinking detection has a strong capacity to extract both spatial and temporal characteristics, it remains having a high number of network parameters, leading to high inference time. Therefore, using Depth-wise Separable Convolution rather than conventional convolution layers inside each branch is considered in this paper as a feasible solution.
翻译:野外人眼眨眼检测在欺骗检测、驾驶疲劳检测等领域中扮演着关键角色。尽管已有诸多研究尝试,但多数方法仍面临挑战,例如:因面部与摄像头距离变化导致提取的眼部图像分辨率不一;或需采用轻量化检测模型以降低推理时间实现实时性能。本研究针对两个问题展开:如何让眨眼检测模型在不同场景下不同分辨率的眼部图像中高效学习;以及如何缩小检测模型规模以加快推理速度。针对第一个问题,我们提出对输入眼部图像进行上采样和下采样至统一分辨率作为潜在解决方案,并探究何种插值方法能使检测模型性能最优。针对第二个问题,尽管近期用于眨眼检测的时空卷积神经网络具备强大的时空特征提取能力,但其网络参数数量仍居高不下,导致推理时间过长。因此,本文考虑在各分支内采用深度可分离卷积替代传统卷积层,作为可行的解决方案。