Building detection and change detection using remote sensing images can help urban and rescue planning. Moreover, they can be used for building damage assessment after natural disasters. Currently, most of the existing models for building detection use only one image (pre-disaster image) to detect buildings. This is based on the idea that post-disaster images reduce the model's performance because of presence of destroyed buildings. In this paper, we propose a siamese model, called SiamixFormer, which uses pre- and post-disaster images as input. Our model has two encoders and has a hierarchical transformer architecture. The output of each stage in both encoders is given to a temporal transformer for feature fusion in a way that query is generated from pre-disaster images and (key, value) is generated from post-disaster images. To this end, temporal features are also considered in feature fusion. Another advantage of using temporal transformers in feature fusion is that they can better maintain large receptive fields generated by transformer encoders compared with CNNs. Finally, the output of the temporal transformer is given to a simple MLP decoder at each stage. The SiamixFormer model is evaluated on xBD, and WHU datasets, for building detection and on LEVIR-CD and CDD datasets for change detection and could outperform the state-of-the-art.
翻译:建筑检测与变化检测利用遥感影像可辅助城市规划和灾后救援规划,还可用于自然灾害后的建筑物损毁评估。当前大多数现有建筑检测模型仅使用单一时相影像(灾前影像)进行建筑检测,其依据在于灾后影像中损毁建筑物的存在会降低模型性能。本文提出一种名为SiamixFormer的孪生模型,该模型以灾前和灾后影像作为输入。模型包含双编码器,采用分层Transformer架构。编码器各阶段的输出被输入到时序Transformer模块进行特征融合,其中查询(query)由灾前影像生成,键值对(key, value)由灾后影像生成,从而在特征融合中融入时序信息。相较于卷积神经网络(CNN),时序Transformer在特征融合中具有优势,可更好地保持Transformer编码器产生的大感受野。最后,每个阶段的时序Transformer输出被输入至简单的MLP解码器。SiamixFormer模型在xBD和WHU数据集上进行建筑检测评估,在LEVIR-CD和CDD数据集上进行变化检测评估,其性能均优于现有最优方法。