Most urban applications necessitate building footprints in the form of concise vector graphics with sharp boundaries rather than pixel-wise raster images. This need contrasts with the majority of existing methods, which typically generate over-smoothed footprint polygons. Editing these automatically produced polygons can be inefficient, if not more time-consuming than manual digitization. This paper introduces a semi-automatic approach for building footprint extraction through semantically-sensitive superpixels and neural graph networks. Drawing inspiration from object-based classification techniques, we first learn to generate superpixels that are not only boundary-preserving but also semantically-sensitive. The superpixels respond exclusively to building boundaries rather than other natural objects, while simultaneously producing semantic segmentation of the buildings. These intermediate superpixel representations can be naturally considered as nodes within a graph. Consequently, graph neural networks are employed to model the global interactions among all superpixels and enhance the representativeness of node features for building segmentation. Classical approaches are utilized to extract and regularize boundaries for the vectorized building footprints. Utilizing minimal clicks and straightforward strokes, we efficiently accomplish accurate segmentation outcomes, eliminating the necessity for editing polygon vertices. Our proposed approach demonstrates superior precision and efficacy, as validated by experimental assessments on various public benchmark datasets. A significant improvement of 8% in AP50 was observed in vector graphics evaluation, surpassing established techniques. Additionally, we have devised an optimized and sophisticated pipeline for interactive editing, poised to further augment the overall quality of the results.
翻译:大多数城市应用需要以简洁矢量图形(而非逐像素栅格图像)形式的建筑足迹,且边界清晰。这一需求与现有方法(通常生成过度平滑的足迹多边形)形成鲜明对比。编辑这些自动生成的多边形效率低下,甚至比手动数字化更耗时。本文提出一种通过语义敏感超像素与神经图网络的半自动建筑足迹提取方法。受基于对象分类技术的启发,我们首先学习生成不仅保持边界而且具有语义敏感性的超像素。这些超像素仅响应建筑边界(而非其他自然物体),同时生成建筑的语义分割。这些中间超像素表示可自然视为图结构中的节点。因此,我们采用图神经网络建模所有超像素之间的全局交互,并增强节点特征对建筑分割的代表性。利用经典方法提取并正则化矢量建筑足迹的边界。通过最小化点击与简单笔画交互,我们高效实现精确分割结果,无需编辑多边形顶点。实验结果表明,所提方法在多个公开基准数据集上展现出优越的精度与效能。在矢量图形评估中,AP50指标提升了8%,显著超越现有技术。此外,我们设计了一套优化且复杂的交互编辑流程,有望进一步提升整体结果质量。