Human action recognition in dark videos is a challenging task for computer vision. Recent research focuses on applying dark enhancement methods to improve the visibility of the video. However, such video processing results in the loss of critical information in the original (un-enhanced) video. Conversely, traditional two-stream methods are capable of learning information from both original and processed videos, but it can lead to a significant increase in the computational cost during the inference phase in the task of video classification. To address these challenges, we propose a novel teacher-student video classification framework, named Dual-Light KnowleDge Distillation for Action Recognition in the Dark (DL-KDD). This framework enables the model to learn from both original and enhanced video without introducing additional computational cost during inference. Specifically, DL-KDD utilizes the strategy of knowledge distillation during training. The teacher model is trained with enhanced video, and the student model is trained with both the original video and the soft target generated by the teacher model. This teacher-student framework allows the student model to predict action using only the original input video during inference. In our experiments, the proposed DL-KDD framework outperforms state-of-the-art methods on the ARID, ARID V1.5, and Dark-48 datasets. We achieve the best performance on each dataset and up to a 4.18% improvement on Dark-48, using only original video inputs, thus avoiding the use of two-stream framework or enhancement modules for inference. We further validate the effectiveness of the distillation strategy in ablative experiments. The results highlight the advantages of our knowledge distillation framework in dark human action recognition.
翻译:暗光视频中的人类动作识别是计算机视觉领域的一项具有挑战性的任务。近期研究主要集中于应用暗光增强方法来提升视频的可见性。然而,此类视频处理会导致原始(未增强)视频中关键信息的丢失。相反,传统的双流方法虽然能够从原始视频和处理后的视频中同时学习信息,但这会导致视频分类任务在推理阶段的计算成本显著增加。为应对这些挑战,我们提出了一种新颖的师生视频分类框架,命名为面向暗光动作识别的双光知识蒸馏(DL-KDD)。该框架使得模型能够同时从原始视频和增强视频中学习,而无需在推理阶段引入额外的计算成本。具体而言,DL-KDD在训练阶段采用了知识蒸馏策略。教师模型使用增强视频进行训练,而学生模型则使用原始视频以及由教师模型生成的软目标进行训练。这种师生框架使得学生模型在推理阶段仅需使用原始输入视频即可预测动作。在我们的实验中,所提出的DL-KDD框架在ARID、ARID V1.5和Dark-48数据集上均优于现有最先进方法。我们在每个数据集上都取得了最佳性能,在Dark-48数据集上最高提升了4.18%,并且仅使用原始视频输入,从而避免了在推理时使用双流框架或增强模块。我们通过消融实验进一步验证了蒸馏策略的有效性。结果凸显了我们的知识蒸馏框架在暗光人类动作识别中的优势。