The Transformer-based deep networks have increasingly shown significant advantages over CNNs. Some existing work has applied it in the field of wildfire recognition or detection. However, we observed that the vanilla Transformer is not friendly for extracting smoke features. Because low-level information such as color, transparency and texture is very important for smoke recognition, and transformer pays more attention to the semantic relevance between middle- or high-level features, and is not sensitive to the subtle changes of low-level features along the space. To solve this problem, we propose the Cross Contrast Patch Embedding(CCPE) module based on the Swin Transformer, which uses the multi-scales spatial frequency contrast information in both vertical and horizontal directions to improve the discrimination of the network on the underlying details. The fuzzy boundary of smoke makes the positive and negative label assignment for instances in a dilemma, which is another challenge for wildfires detection. To solve this problem, a Separable Negative Sampling Mechanism(SNSM) is proposed. By using two different negative instance sampling strategies on positive images and negative images respectively, the problem of supervision signal confusion caused by label diversity in the process of network training is alleviated. This paper also releases the RealFire Test, the largest real wildfire test set so far, to evaluate the proposed method and promote future research. It contains 50,535 images from 3,649 video clips. The proposed method has been extensively tested and evaluated on RealFire Test dataset, and has a significant performance improvement compared with the baseline detection models.
翻译:基于Transformer的深度网络日益显示出对CNN的显著优势。已有部分研究将其应用于野火识别或检测领域。然而,我们发现原始Transformer在提取烟雾特征方面并不友好。这是因为颜色、透明度和纹理等低级信息对烟雾识别至关重要,而Transformer更关注中高级特征之间的语义相关性,对低级特征沿空间的细微变化不够敏感。为解决此问题,我们基于Swin Transformer提出了交叉对比补丁嵌入(CCPE)模块,该模块利用垂直和水平方向的多尺度空间频率对比信息,提升网络对底层细节的区分能力。烟雾的模糊边界导致实例的正负标签分配陷入困境,这是野火检测面临的另一个挑战。为此,我们提出了可分离负样本采样机制(SNSM)。通过对正负图像分别采用两种不同的负样本实例采样策略,缓解了网络训练过程中因标签多样性导致的监督信号混乱问题。本文还发布了迄今最大的真实野火测试集RealFire Test,以评估所提方法并促进未来研究。该数据集包含来自3,649个视频片段的50,535张图像。所提方法在RealFire Test数据集上进行了广泛的测试与评估,与基线检测模型相比,性能提升显著。