Efficient Building Roof Type Classification: A Domain-Specific Self-Supervised Approach

Accurate classification of building roof types from aerial imagery is crucial for various remote sensing applications, including urban planning, disaster management, and infrastructure monitoring. However, this task is often hindered by the limited availability of labeled data for supervised learning approaches. To address this challenge, this paper investigates the effectiveness of self supervised learning with EfficientNet architectures, known for their computational efficiency, for building roof type classification. We propose a novel framework that incorporates a Convolutional Block Attention Module (CBAM) to enhance the feature extraction capabilities of EfficientNet. Furthermore, we explore the benefits of pretraining on a domain-specific dataset, the Aerial Image Dataset (AID), compared to ImageNet pretraining. Our experimental results demonstrate the superiority of our approach. Employing Simple Framework for Contrastive Learning of Visual Representations (SimCLR) with EfficientNet-B3 and CBAM achieves a 95.5% accuracy on our validation set, matching the performance of state-of-the-art transformer-based models while utilizing significantly fewer parameters. We also provide a comprehensive evaluation on two challenging test sets, demonstrating the generalization capability of our method. Notably, our findings highlight the effectiveness of domain-specific pretraining, consistently leading to higher accuracy compared to models pretrained on the generic ImageNet dataset. Our work establishes EfficientNet based self-supervised learning as a computationally efficient and highly effective approach for building roof type classification, particularly beneficial in scenarios with limited labeled data.

翻译：从航空影像中准确分类建筑屋顶类型对于城市规划、灾害管理和基础设施监测等多种遥感应用至关重要。然而，该任务常因监督学习方法所需标注数据的有限可用性而受到阻碍。为应对这一挑战，本文研究了以计算效率著称的EfficientNet架构结合自监督学习在建筑屋顶类型分类中的有效性。我们提出了一种新颖框架，该框架集成了卷积块注意力模块以增强EfficientNet的特征提取能力。此外，我们探讨了在领域特定数据集——航空影像数据集上进行预训练相较于在ImageNet上预训练的优势。我们的实验结果证明了所提方法的优越性。采用简单视觉表征对比学习框架结合EfficientNet-B3与CBAM，在我们的验证集上达到了95.5%的准确率，与最先进的基于Transformer模型的性能相当，同时使用的参数量显著减少。我们还在两个具有挑战性的测试集上进行了全面评估，证明了我们方法的泛化能力。值得注意的是，我们的发现凸显了领域特定预训练的有效性，与在通用ImageNet数据集上预训练的模型相比，其始终能获得更高的准确率。我们的工作确立了基于EfficientNet的自监督学习作为一种计算高效且高度有效的建筑屋顶类型分类方法，尤其在标注数据有限的场景中具有显著优势。