Reconstructing a structured vector-graphics representation from a rasterized floorplan image is typically an important prerequisite for computational tasks involving floorplans such as automated understanding or CAD workflows. However, existing techniques struggle in faithfully generating the structure and semantics conveyed by complex floorplans that depict large indoor spaces with many rooms and a varying numbers of polygon corners. To this end, we propose Raster2Seq, framing floorplan reconstruction as a sequence-to-sequence task in which floorplan elements--such as rooms, windows, and doors--are represented as labeled polygon sequences that jointly encode geometry and semantics. Our approach introduces an autoregressive decoder that learns to predict the next corner conditioned on image features and previously generated corners using guidance from learnable anchors. These anchors represent spatial coordinates in image space, hence allowing for effectively directing the attention mechanism to focus on informative image regions. By embracing the autoregressive mechanism, our method offers flexibility in the output format, enabling for efficiently handling complex floorplans with numerous rooms and diverse polygon structures. Our method achieves state-of-the-art performance on standard benchmarks such as Structure3D, CubiCasa5K, and Raster2Graph, while also demonstrating strong generalization to more challenging datasets like WAFFLE, which contain diverse room structures and complex geometric variations.
翻译:从栅格化平面图图像中重建结构化的矢量图形表示,通常是涉及平面图的计算任务(如自动化理解或CAD工作流)的重要前提。然而,现有技术在忠实生成复杂平面图所传达的结构与语义方面存在困难,这些平面图描绘了具有多个房间和不同数量多边形角点的大型室内空间。为此,我们提出Raster2Seq,将平面图重建构建为一个序列到序列任务,其中平面图元素(如房间、窗户和门)被表示为联合编码几何与语义的带标签多边形序列。我们的方法引入了一种自回归解码器,该解码器学习基于图像特征和先前生成的角点,在可学习锚点的引导下预测下一个角点。这些锚点表示图像空间中的空间坐标,从而能够有效引导注意力机制聚焦于信息丰富的图像区域。通过采用自回归机制,我们的方法在输出格式上具有灵活性,能够高效处理具有众多房间和多样化多边形结构的复杂平面图。我们的方法在Structure3D、CubiCasa5K和Raster2Graph等标准基准测试中取得了最先进的性能,同时在更具挑战性的数据集(如包含多样化房间结构和复杂几何变化的WAFFLE)上也展现出强大的泛化能力。