SuperpixelGraph: Semi-automatic generation of building footprint through semantic-sensitive superpixel and neural graph networks

Most urban applications necessitate building footprints in the form of concise vector graphics with sharp boundaries rather than pixel-wise raster images. This need contrasts with the majority of existing methods, which typically generate over-smoothed footprint polygons. Editing these automatically produced polygons can be inefficient, if not more time-consuming than manual digitization. This paper introduces a semi-automatic approach for building footprint extraction through semantically-sensitive superpixels and neural graph networks. Drawing inspiration from object-based classification techniques, we first learn to generate superpixels that are not only boundary-preserving but also semantically-sensitive. The superpixels respond exclusively to building boundaries rather than other natural objects, while simultaneously producing semantic segmentation of the buildings. These intermediate superpixel representations can be naturally considered as nodes within a graph. Consequently, graph neural networks are employed to model the global interactions among all superpixels and enhance the representativeness of node features for building segmentation. Classical approaches are utilized to extract and regularize boundaries for the vectorized building footprints. Utilizing minimal clicks and straightforward strokes, we efficiently accomplish accurate segmentation outcomes, eliminating the necessity for editing polygon vertices. Our proposed approach demonstrates superior precision and efficacy, as validated by experimental assessments on various public benchmark datasets. We observe a 10\% enhancement in the metric for superpixel clustering and an 8\% increment in vector graphics evaluation, when compared with established techniques. Additionally, we have devised an optimized and sophisticated pipeline for interactive editing, poised to further augment the overall quality of the results.

翻译：大多数城市应用需要以边界清晰的简洁矢量图形而非像素级栅格图像的形式呈现建筑足迹。这一需求与现有多数方法形成鲜明对比，后者通常生成过度平滑的足迹多边形。编辑这些自动生成的多边形可能效率低下，甚至比手动数字化更耗时。本文提出一种通过语义敏感超像素与神经图网络实现建筑足迹提取的半自动方法。受基于对象分类技术的启发，我们首先学习生成既保留边界又具有语义敏感性的超像素。这些超像素仅响应建筑边界而不响应其他自然物体，同时实现建筑的语义分割。此类中间超像素表示可自然视为图中的节点。因此，我们采用图神经网络建模所有超像素间的全局交互关系，并增强节点特征对建筑分割的代表性。经典方法被用于提取和正则化矢量建筑足迹的边界。通过少量点击与简单描绘，我们能够高效完成精确分割结果，无需编辑多边形顶点。所提方法在多个公共基准数据集上的实验评估中展现出卓越的精度与有效性。与现有技术相比，超像素聚类指标提升10%，矢量图形评估指标提升8%。此外，我们设计了一套优化且精巧的交互式编辑流程，有望进一步提升结果的总体质量。