We present a novel pipeline for learning the conditional distribution of a building roof mesh given pixels from an aerial image, under the assumption that roof geometry follows a set of regular patterns. Unlike alternative methods that require multiple images of the same object, our approach enables estimating 3D roof meshes using only a single image for predictions. The approach employs the PolyGen, a deep generative transformer architecture for 3D meshes. We apply this model in a new domain and investigate the sensitivity of the image resolution. We propose a novel metric to evaluate the performance of the inferred meshes, and our results show that the model is robust even at lower resolutions, while qualitatively producing realistic representations for out-of-distribution samples.
翻译:我们提出了一种新颖的流程,用于在假设屋顶几何遵循一组规则模式的前提下,学习给定航拍图像像素条件下的建筑屋顶网格条件分布。与需要同一物体多幅图像的其他方法不同,我们的方法仅使用单张图像即可估算三维屋顶网格。该流程采用了PolyGen——一种用于三维网格的深度生成式Transformer架构。我们将该模型应用于新领域,并研究了图像分辨率对其性能的敏感性。我们提出了一种新型指标来评估推断网格的性能,结果表明该模型即使在较低分辨率下依然稳健,同时能够为分布外样本生成质量逼真的表示。