Converting raster floorplans into engineering-grade vector graphics is challenging due to complex topology and strict geometric constraints. To address this, we present FloorplanVLM, a unified framework that reformulates floorplan vectorization as an image-conditioned sequence modeling task. Unlike pixel-based methods that rely on fragile heuristics or query-based transformers that generate fragmented rooms, our model directly outputs structured JSON sequences representing the global topology. This 'pixels-to-sequence' paradigm enables the precise and holistic constraint satisfaction of complex geometries, such as slanted walls and curved arcs. To support this data-hungry approach, we introduce a scalable data engine: we construct a large-scale dataset (Floorplan-2M) and a high-fidelity subset (Floorplan-HQ-300K) to balance geometric diversity and pixel-level precision. We then employ a progressive training strategy, using Supervised Fine-Tuning (SFT) for structural grounding and quality annealing, followed by Group Relative Policy Optimization (GRPO) for strict geometric alignment. To standardize evaluation on complex layouts, we establish and open-source FPBench-2K. Evaluated on this rigorous benchmark, FloorplanVLM demonstrates exceptional structural validity, achieving $\textbf{92.52%}$ external-wall IoU and robust generalization across non-Manhattan architectures.
翻译:将栅格平面图转换为工程级矢量图形具有挑战性,原因在于其复杂的拓扑结构和严格的几何约束。为解决此问题,我们提出了FloorplanVLM,这是一个统一框架,将平面图矢量化重新表述为图像条件序列建模任务。与依赖脆弱启发式的基于像素的方法或生成碎片化房间的基于查询的Transformer不同,我们的模型直接输出表示全局拓扑的结构化JSON序列。这种“像素到序列”范式能够精确且整体地满足复杂几何形状(如倾斜墙体和弧形)的约束。为支持这种数据密集型方法,我们引入了一个可扩展的数据引擎:我们构建了一个大规模数据集(Floorplan-2M)和一个高保真子集(Floorplan-HQ-300K),以平衡几何多样性和像素级精度。随后,我们采用渐进式训练策略,使用监督微调(SFT)进行结构对齐和质量退火,接着采用组相对策略优化(GRPO)实现严格的几何对齐。为了标准化复杂布局的评估,我们建立并开源了FPBench-2K。在这一严格基准上的评估表明,FloorplanVLM展现出卓越的结构有效性,实现了$\textbf{92.52%}$的外墙交并比,并在非曼哈顿架构上表现出强大的泛化能力。