In remote sensing there exists a common need for learning scale invariant shapes of objects like buildings. Prior works relies on tweaking multiple loss functions to convert segmentation maps into the final scale invariant representation, necessitating arduous design and optimization. For this purpose we introduce the GeoFormer, a novel architecture which presents a remedy to the said challenges, learning to generate multipolygons end-to-end. By modeling keypoints as spatially dependent tokens in an auto-regressive manner, the GeoFormer outperforms existing works in delineating building objects from satellite imagery. We evaluate the robustness of the GeoFormer against former methods through a variety of parameter ablations and highlight the advantages of optimizing a single likelihood function. Our study presents the first successful application of auto-regressive transformer models for multi-polygon predictions in remote sensing, suggesting a promising methodological alternative for building vectorization.
翻译:在遥感领域,学习建筑物等对象的尺度不变形状是一项常见需求。先前的研究依赖于调整多个损失函数,将分割图转换为最终的尺度不变表示,这需要繁琐的设计和优化过程。为此,我们提出了GeoFormer,这是一种新颖的架构,旨在应对上述挑战,能够端到端地学习生成多个多边形。通过以自回归方式将关键点建模为空间相关的标记,GeoFormer在从卫星图像中勾勒建筑物对象方面超越了现有方法。我们通过多种参数消融实验评估了GeoFormer相对于先前方法的鲁棒性,并强调了优化单一似然函数的优势。本研究首次成功将自回归Transformer模型应用于遥感中的多多边形预测,为建筑物矢量化提供了一种有前景的方法学替代方案。