Aerial image analysis, specifically the semantic segmentation thereof, is the basis for applications such as automatically creating and updating maps, tracking city growth, or tracking deforestation. In true orthophotos, which are often used in these applications, many objects and regions can be approximated well by polygons. However, this fact is rarely exploited by state-of-the-art semantic segmentation models. Instead, most models allow unnecessary degrees of freedom in their predictions by allowing arbitrary region shapes. We therefore present a refinement of our deep learning model which predicts binary space partitioning trees, an efficient polygon representation. The refinements include a new feature decoder architecture and a new differentiable BSP tree renderer which both avoid vanishing gradients. Additionally, we designed a novel loss function specifically designed to improve the spatial partitioning defined by the predicted trees. Furthermore, our expanded model can predict multiple trees at once and thus can predict class-specific segmentations. Taking all modifications together, our model achieves state-of-the-art performance while using up to 60% fewer model parameters when using a small backbone model or up to 20% fewer model parameters when using a large backbone model.
翻译:航空图像分析,特别是其语义分割,是自动创建和更新地图、跟踪城市扩张或监测森林砍伐等应用的基础。在这些应用中常用的真实正射影像中,许多物体和区域都可以用多边形较好地近似。然而,这一事实很少被最先进的语义分割模型所利用。相反,大多数模型通过允许任意形状的区域,在预测中引入了不必要的自由度。因此,我们提出了一种改进的深度学习模型,该模型预测二叉空间分区树,这是一种高效的多边形表示方法。改进包括一种新的特征解码器架构和一种新的可微分BSP树渲染器,两者均避免了梯度消失问题。此外,我们设计了一种新颖的损失函数,专门用于改善由预测树定义的空间分区。同时,我们的扩展模型可以一次性预测多个树,从而能够预测特定类别的分割结果。综合所有修改,我们的模型在使用小型骨干网络时最多可减少60%的模型参数,或在使用大型骨干网络时最多可减少20%的模型参数,同时实现最先进的性能。