Lake extraction from remote sensing images is challenging due to the complex lake shapes and inherent data noises. Existing methods suffer from blurred segmentation boundaries and poor foreground modeling. This paper proposes a hybrid CNN-Transformer architecture, called LEFormer, for accurate lake extraction. LEFormer contains three main modules: CNN encoder, Transformer encoder, and cross-encoder fusion. The CNN encoder effectively recovers local spatial information and improves fine-scale details. Simultaneously, the Transformer encoder captures long-range dependencies between sequences of any length, allowing them to obtain global features and context information. The cross-encoder fusion module integrates the local and global features to improve mask prediction. Experimental results show that LEFormer consistently achieves state-of-the-art performance and efficiency on the Surface Water and the Qinghai-Tibet Plateau Lake datasets. Specifically, LEFormer achieves 90.86% and 97.42% mIoU on two datasets with a parameter count of 3.61M, respectively, while being 20 minor than the previous best lake extraction method. The source code is available at https://github.com/BastianChen/LEFormer.
翻译:从遥感影像中提取湖泊面临复杂湖泊形状和固有数据噪声的挑战。现有方法存在分割边界模糊和前景建模不足的问题。本文提出一种名为LEFormer的混合CNN-Transformer架构,用于精确提取湖泊。LEFormer包含三个主要模块:CNN编码器、Transformer编码器和跨编码器融合模块。CNN编码器有效恢复局部空间信息并改善精细尺度细节,同时Transformer编码器捕获任意长度序列间的长距离依赖关系,从而获取全局特征和上下文信息。跨编码器融合模块集成局部与全局特征以改进掩膜预测。实验结果表明,LEFormer在Surface Water和青藏高原湖泊数据集上持续取得最先进的性能和效率。具体而言,LEFormer在两个数据集上分别达到90.86%和97.42%的mIoU,参数量为3.61M,同时比先前最优湖泊提取方法小20%。源代码已发布于https://github.com/BastianChen/LEFormer。