Building Footprint Extraction (BFE) in off-nadir aerial images often relies on roof segmentation and roof-to-footprint offset prediction, then drugging roof-to-footprint via the offset. However, the results from this multi-stage inference are not applicable in data production, because of the low quality of masks given by prediction. To solve this problem, we proposed OBMv2 in this paper, which supports both end-to-end and promptable polygonal footprint prediction. Different from OBM, OBMv2 using a newly proposed Self Offset Attention (SOFA) to bridge the performance gap on bungalow and skyscraper, which realized a real end-to-end footprint polygon prediction without postprocessing. %, such as Non-Maximum Suppression (NMS) and Distance NMS (DNMS). % To fully use information contained in roof masks, building masks and offsets, we proposed a Multi-level Information SyStem (MISS) for footprint prediction, with which OBMv2 can predict footprints even with insufficient predictions. Additionally, to squeeze information from the same model, we were inspired by Retrieval-Augmented Generation (RAG) in Nature Language Processing and proposed "RAG in BFE" problem. To verify the effectiveness of the proposed method, experiments were conducted on open datasets BONAI and OmniCity-view3. A generalization test was also conducted on Huizhou test set. The code will be available at \url{https://github.com/likaiucas/OBM}.
翻译:非天底视角航空影像中的建筑基底提取通常依赖于屋顶分割与屋顶至基底偏移量预测,继而通过偏移量将屋顶轮廓拖拽至基底位置。然而,由于预测所得掩码质量较低,这种多阶段推理的结果难以直接应用于实际数据生产。为解决此问题,本文提出了OBMv2模型,该模型同时支持端到端与可提示的多边形基底预测。相较于OBM,OBMv2采用新提出的自偏移注意力机制以弥合平房与摩天大楼间的性能差异,实现了无需后处理(如非极大值抑制或距离非极大值抑制)的真正端到端基底多边形预测。为充分利用屋顶掩码、建筑掩码及偏移量中包含的信息,我们提出了用于基底预测的多层级信息系统,使OBMv2即使在预测信息不足时仍能生成基底轮廓。此外,为充分挖掘同一模型的信息潜力,我们受自然语言处理中检索增强生成的启发,提出了“建筑基底提取中的检索增强生成”问题。为验证所提方法的有效性,我们在公开数据集BONAI和OmniCity-view3上进行了实验,并在惠州测试集上进行了泛化性测试。代码将在\url{https://github.com/likaiucas/OBM}公开。