Semi-supervised learning suffers from the imbalance of labeled and unlabeled training data in the video surveillance scenario. In this paper, we propose a new semi-supervised learning method called SIAVC for industrial accident video classification. Specifically, we design a video augmentation module called the Super Augmentation Block (SAB). SAB adds Gaussian noise and randomly masks video frames according to historical loss on the unlabeled data for model optimization. Then, we propose a Video Cross-set Augmentation Module (VCAM) to generate diverse pseudo-label samples from the high-confidence unlabeled samples, which alleviates the mismatch of sampling experience and provides high-quality training data. Additionally, we construct a new industrial accident surveillance video dataset with frame-level annotation, namely ECA9, to evaluate our proposed method. Compared with the state-of-the-art semi-supervised learning based methods, SIAVC demonstrates outstanding video classification performance, achieving 88.76\% and 89.13\% accuracy on ECA9 and Fire Detection datasets, respectively. The source code and the constructed dataset ECA9 will be released in \url{https://github.com/AlchemyEmperor/SIAVC}.
翻译:半监督学习在视频监控场景中常面临标注与未标注训练数据不平衡的问题。本文提出一种名为SIAVC的新型半监督学习方法,用于工业事故视频分类。具体而言,我们设计了一个称为超级增强模块(SAB)的视频增强模块。SAB根据未标注数据上的历史损失添加高斯噪声并随机掩码视频帧,以优化模型。随后,我们提出视频跨集增强模块(VCAM),从高置信度未标注样本中生成多样化的伪标签样本,从而缓解采样经验不匹配问题并提供高质量训练数据。此外,我们构建了一个具有帧级标注的新型工业事故监控视频数据集ECA9,用于评估所提方法。与当前最先进的基于半监督学习的方法相比,SIAVC展现出卓越的视频分类性能,在ECA9和火灾检测数据集上分别达到88.76%和89.13%的准确率。源代码及构建的数据集ECA9将通过\url{https://github.com/AlchemyEmperor/SIAVC}公开。