A Comparative Study of Knowledge Transfer Methods for Misaligned Urban Building Labels

from arxiv, This work has been submitted to Elsevier for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Misalignment in Earth observation (EO) images and building labels impact the training of accurate convolutional neural networks (CNNs) for semantic segmentation of building footprints. Recently, three Teacher-Student knowledge transfer methods have been introduced to address this issue: supervised domain adaptation (SDA), knowledge distillation (KD), and deep mutual learning (DML). However, these methods are merely studied for different urban buildings (low-rise, mid-rise, high-rise, and skyscrapers), where misalignment increases with building height and spatial resolution. In this study, we present a workflow for the systematic comparative study of the three methods. The workflow first identifies the best (with the highest evaluation scores) hyperparameters, lightweight CNNs for the Student (among 43 CNNs from Computer Vision), and encoder-decoder networks (EDNs) for both Teachers and Students. Secondly, three building footprint datasets are developed to train and evaluate the identified Teachers and Students in the three transfer methods. The results show that U-Net with VGG19 (U-VGG19) is the best Teacher, and U-EfficientNetv2B3 and U-EfficientNet-lite0 are among the best Students. With these Teacher-Student pairs, SDA could yield upto 0.943, 0.868, 0.912, and 0.697 F1 scores in the low-rise, mid-rise, high-rise, and skyscrapers respectively. KD and DML provide model compression of upto 82%, despite marginal loss in performance. This new comparison concludes that SDA is the most effective method to address the misalignment problem, while KD and DML can efficiently compress network size without significant loss in performance. The 158 experiments and datasets developed in this study will be valuable to minimise the misaligned labels.

翻译：地球观测图像与建筑标签之间的错位会影响用于建筑足迹语义分割的精确卷积神经网络的训练。近期，三种教师-学生知识迁移方法被引入以解决该问题：监督域自适应、知识蒸馏和深度相互学习。然而，这些方法仅针对不同类型的城市建筑（低层、中层、高层及超高层）进行了研究，其中错位程度随建筑高度和空间分辨率的增加而加剧。本研究提出了一套系统比较三种方法的流程。该流程首先确定了最佳（评分最高）超参数、适用于学生网络的轻量级卷积神经网络（选自计算机视觉领域的43种卷积神经网络），以及适用于教师和学生网络的编码器-解码器网络。其次，开发了三个建筑足迹数据集，用于训练和评估三种迁移方法中的选定教师与学生网络。结果表明，采用VGG19的U-Net（U-VGG19）是最优教师网络，而U-EfficientNetv2B3和U-EfficientNet-lite0属于最优学生网络。使用这些教师-学生配对时，监督域自适应在低层、中层、高层及超高层建筑中分别可获得高达0.943、0.868、0.912和0.697的F1分数。知识蒸馏和深度相互学习可实现最高82%的模型压缩率，尽管性能略有损失。这一新比较得出结论：监督域自适应是解决错位问题最有效的方法，而知识蒸馏和深度相互学习可在不显著降低性能的前提下高效压缩网络规模。本研究开展的158次实验及开发的数据集将有助于最小化标签错位问题。