Building extraction and height estimation are two important basic tasks in remote sensing image interpretation, which are widely used in urban planning, real-world 3D construction, and other fields. Most of the existing research regards the two tasks as independent studies. Therefore the height information cannot be fully used to improve the accuracy of building extraction and vice versa. In this work, we combine the individuaL buIlding extraction and heiGHt estimation through a unified multiTask learning network (LIGHT) for the first time, which simultaneously outputs a height map, bounding boxes, and a segmentation mask map of buildings. Specifically, LIGHT consists of an instance segmentation branch and a height estimation branch. In particular, so as to effectively unify multi-scale feature branches and alleviate feature spans between branches, we propose a Gated Cross Task Interaction (GCTI) module that can efficiently perform feature interaction between branches. Experiments on the DFC2023 dataset show that our LIGHT can achieve superior performance, and our GCTI module with ResNet101 as the backbone can significantly improve the performance of multitask learning by 2.8% AP50 and 6.5% delta1, respectively.
翻译:建筑提取与高度估计是遥感图像解译中两项重要的基础任务,广泛应用于城市规划、真实三维重建等领域。现有研究大多将这两项任务视为独立研究,因此无法充分利用高度信息提升建筑提取精度,反之亦然。本研究首次通过统一多任务学习网络(LIGHT)将单体建筑提取与高度估计相结合,可同时输出建筑的高度图、边界框和分割掩膜图。具体而言,LIGHT由实例分割分支和高度估计分支组成。为有效统一多尺度特征分支并缓解分支间的特征跨度问题,我们提出门控跨任务交互(GCTI)模块,该模块能够高效实现分支间的特征交互。在DFC2023数据集上的实验表明,我们的LIGHT可获得优越性能,且以ResNet101为主干网络的GCTI模块能将多任务学习的性能分别提升2.8% AP50和6.5% delta1。