Convolutional Neural Networks (CNNs) are models that are utilized extensively for the hierarchical extraction of features. Vision transformers (ViTs), through the use of a self-attention mechanism, have recently achieved superior modeling of global contextual information compared to CNNs. However, to realize their image classification strength, ViTs require substantial training datasets. Where the available training data are limited, current advanced multi-layer perceptrons (MLPs) can provide viable alternatives to both deep CNNs and ViTs. In this paper, we developed the SGU-MLP, a learning algorithm that effectively uses both MLPs and spatial gating units (SGUs) for precise land use land cover (LULC) mapping. Results illustrated the superiority of the developed SGU-MLP classification algorithm over several CNN and CNN-ViT-based models, including HybridSN, ResNet, iFormer, EfficientFormer and CoAtNet. The proposed SGU-MLP algorithm was tested through three experiments in Houston, USA, Berlin, Germany and Augsburg, Germany. The SGU-MLP classification model was found to consistently outperform the benchmark CNN and CNN-ViT-based algorithms. For example, for the Houston experiment, SGU-MLP significantly outperformed HybridSN, CoAtNet, Efficientformer, iFormer and ResNet by approximately 15%, 19%, 20%, 21%, and 25%, respectively, in terms of average accuracy. The code will be made publicly available at https://github.com/aj1365/SGUMLP
翻译:卷积神经网络(CNN)是广泛用于层级特征提取的模型。视觉Transformer(ViT)通过自注意力机制,近期在全局上下文信息建模方面超越了CNN。然而,为实现图像分类优势,ViT需要大规模训练数据集。在训练数据有限的情况下,当前先进的多层感知机(MLP)可为深度CNN和ViT提供可行替代方案。本文提出了SGU-MLP算法,该算法有效结合MLP与空间门控单元(SGU),实现精确的土地利用/土地覆盖(LULC)制图。实验结果表明,所开发的SGU-MLP分类算法优于多种基于CNN及CNN-ViT的模型(包括HybridSN、ResNet、iFormer、EfficientFormer和CoAtNet)。该算法在美国休斯顿、德国柏林和德国奥格斯堡三个实验场景中进行了验证,发现SGU-MLP分类模型始终优于基准的CNN及CNN-ViT算法。例如,在休斯顿实验中,SGU-MLP在平均精度上分别比HybridSN、CoAtNet、EfficientFormer、iFormer和ResNet高出约15%、19%、20%、21%和25%。代码将开源至https://github.com/aj1365/SGUMLP