The number of traffic accidents has been continuously increasing in recent years worldwide. Many accidents are caused by distracted drivers, who take their attention away from driving. Motivated by the success of Convolutional Neural Networks (CNNs) in computer vision, many researchers developed CNN-based algorithms to recognize distracted driving from a dashcam and warn the driver against unsafe behaviors. However, current models have too many parameters, which is unfeasible for vehicle-mounted computing. This work proposes a novel knowledge-distillation-based framework to solve this problem. The proposed framework first constructs a high-performance teacher network by progressively strengthening the robustness to illumination changes from shallow to deep layers of a CNN. Then, the teacher network is used to guide the architecture searching process of a student network through knowledge distillation. After that, we use the teacher network again to transfer knowledge to the student network by knowledge distillation. Experimental results on the Statefarm Distracted Driver Detection Dataset and AUC Distracted Driver Dataset show that the proposed approach is highly effective for recognizing distracted driving behaviors from photos: (1) the teacher network's accuracy surpasses the previous best accuracy; (2) the student network achieves very high accuracy with only 0.42M parameters (around 55% of the previous most lightweight model). Furthermore, the student network architecture can be extended to a spatial-temporal 3D CNN for recognizing distracted driving from video clips. The 3D student network largely surpasses the previous best accuracy with only 2.03M parameters on the Drive&Act Dataset. The source code is available at https://github.com/Dichao-Liu/Lightweight_Distracted_Driver_Recognition_with_Distillation-Based_NAS_and_Knowledge_Transfer.
翻译:近年来,全球交通事故数量持续攀升。许多事故由分心驾驶引发——驾驶员注意力脱离驾驶操作。受卷积神经网络(CNN)在计算机视觉领域成功的启发,研究人员开发了基于CNN的算法,通过行车记录仪识别分心驾驶行为并向驾驶员发出不安全行为预警。然而,现有模型参数量过大,难以在车载计算设备上部署。本研究提出一种基于知识蒸馏的新型框架解决该问题。该框架首先通过从CNN浅层到深层渐进增强光照鲁棒性,构建高性能教师网络;继而利用教师网络通过知识蒸馏指导学生网络的架构搜索过程;最后再次使用教师网络通过知识蒸馏向学生网络迁移知识。在Statefarm分心驾驶检测数据集和AUC分心驾驶数据集上的实验表明,本文方法在照片级分心驾驶行为识别中高效性显著:(1)教师网络准确率超越此前最优结果;(2)学生网络以仅0.42M参数量(约为此前最轻量模型55%)实现极高准确率。此外,该学生网络架构可扩展为时空三维CNN,用于视频片段级分心驾驶识别。在Drive&Act数据集上,三维学生网络以仅2.03M参数量大幅超越此前最佳准确率。源代码已发布于https://github.com/Dichao-Liu/Lightweight_Distracted_Driver_Recognition_with_Distillation-Based_NAS_and_Knowledge_Transfer。