Extracting polygonal building footprints from off-nadir imagery is crucial for diverse applications. Current deep-learning-based extraction approaches predominantly rely on semantic segmentation paradigms and post-processing algorithms, limiting their boundary precision and applicability. However, existing polygonal extraction methodologies are inherently designed for near-nadir imagery and fail under the geometric complexities introduced by off-nadir viewing angles. To address these challenges, this paper introduces Polygonal Footprint Network (PolyFootNet), a novel deep-learning framework that directly outputs polygonal building footprints without requiring external post-processing steps. PolyFootNet employs a High-Quality Mask Prompter to generate precise roof masks, which guide polygonal vertex extraction in a unified model pipeline. A key contribution of PolyFootNet is introducing the Self Offset Attention mechanism, grounded in Nadaraya-Watson regression, to effectively mitigate the accuracy discrepancy observed between low-rise and high-rise buildings. This approach allows low-rise building predictions to leverage angular corrections learned from high-rise building offsets, significantly enhancing overall extraction accuracy. Additionally, motivated by the inherent ambiguity of building footprint extraction tasks, we systematically investigate alternative extraction paradigms and demonstrate that a combined approach of building masks and offsets achieves superior polygonal footprint results. Extensive experiments validate PolyFootNet's effectiveness, illustrating its promising potential as a robust, generalizable, and precise polygonal building footprint extraction method from challenging off-nadir imagery. To facilitate further research, we will release pre-trained weights of our offset prediction module at https://github.com/likaiucas/PolyFootNet.
翻译:从非星下点图像中提取多边形建筑物轮廓对于多种应用至关重要。当前基于深度学习的提取方法主要依赖于语义分割范式及后处理算法,这限制了其边界精度和适用性。然而,现有的多边形提取方法本质上是为近星下点图像设计的,在非星下点视角引入的几何复杂性下会失效。为应对这些挑战,本文提出了多边形轮廓网络(PolyFootNet),这是一种新颖的深度学习框架,可直接输出多边形建筑物轮廓,无需外部后处理步骤。PolyFootNet采用高质量掩码提示器生成精确的屋顶掩码,在一个统一的模型流程中指导多边形顶点提取。PolyFootNet的一个关键贡献是引入了基于Nadaraya-Watson回归的自偏移注意力机制,以有效缓解在低层与高层建筑物之间观察到的精度差异。该方法使低层建筑物的预测能够利用从高层建筑物偏移中学习到的角度校正,从而显著提升整体提取精度。此外,受建筑物轮廓提取任务固有模糊性的启发,我们系统地研究了替代提取范式,并证明结合建筑物掩码与偏移量的方法能获得更优的多边形轮廓结果。大量实验验证了PolyFootNet的有效性,展示了其作为一种从具有挑战性的非星下点图像中提取鲁棒、可泛化且精确的多边形建筑物轮廓方法的巨大潜力。为促进进一步研究,我们将在 https://github.com/likaiucas/PolyFootNet 发布我们偏移预测模块的预训练权重。