Building Footprint Extraction (BFE) from off-nadir aerial images often involves roof segmentation and offset prediction to adjust roof boundaries to the building footprint. However, this multi-stage approach typically produces low-quality results, limiting its applicability in real-world data production. To address this issue, we present OBMv2, an end-to-end and promptable model for polygonal footprint prediction. Unlike its predecessor OBM, OBMv2 introduces a novel Self Offset Attention (SOFA) mechanism that improves performance across diverse building types, from bungalows to skyscrapers, enabling end-to-end footprint prediction without post-processing. Additionally, we propose a Multi-level Information System (MISS) to effectively leverage roof masks, building masks, and offsets for accurate footprint prediction. We evaluate OBMv2 on the BONAI and OmniCity-view3 datasets and demonstrate its generalization on the Huizhou test set. The code will be available at https://github.com/likaiucas/OBMv2.
翻译:从非天底点航拍图像中提取建筑轮廓通常涉及屋顶分割和偏移量预测两个步骤,通过调整屋顶边界来获得建筑轮廓。然而,这种多阶段方法通常产生质量较低的结果,限制了其在实际数据生产中的适用性。为解决此问题,我们提出了OBMv2,一种用于多边形轮廓预测的端到端可提示模型。与上一代OBM不同,OBMv2引入了一种新颖的自偏移注意力机制,该机制提升了从平房到摩天大楼等各类建筑类型的处理性能,实现了无需后处理的端到端轮廓预测。此外,我们提出了一种多层级信息系统,以有效利用屋顶掩码、建筑掩码和偏移量来实现精确的轮廓预测。我们在BONAI和OmniCity-view3数据集上评估了OBMv2,并在惠州测试集上验证了其泛化能力。代码将在https://github.com/likaiucas/OBMv2 公开。