We study the task of panoptic symbol spotting, which involves identifying both individual instances of countable things and the semantic regions of uncountable stuff in computer-aided design (CAD) drawings composed of vector graphical primitives. Existing methods typically rely on image rasterization, graph construction, or point-based representation, but these approaches often suffer from high computational costs, limited generality, and loss of geometric structural information. In this paper, we propose VecFormer, a novel method that addresses these challenges through line-based representation of primitives. This design preserves the geometric continuity of the original primitive, enabling more accurate shape representation while maintaining a computation-friendly structure, making it well-suited for vector graphic understanding tasks. To further enhance prediction reliability, we introduce a Branch Fusion Refinement module that effectively integrates instance and semantic predictions, resolving their inconsistencies for more coherent panoptic outputs. Extensive experiments demonstrate that our method establishes a new state-of-the-art, achieving 91.1 PQ, with Stuff-PQ improved by 9.6 and 21.2 points over the second-best results under settings with and without prior information, respectively, highlighting the strong potential of line-based representation as a foundation for vector graphic understanding.
翻译:本研究探讨全景符号识别任务,该任务涉及识别计算机辅助设计(CAD)图纸中可计数物体的独立实例以及不可计数材质的语义区域,这些图纸由矢量图形基元构成。现有方法通常依赖于图像栅格化、图结构构建或基于点的表示,但这些方法往往存在计算成本高、通用性有限以及几何结构信息丢失的问题。本文提出VecFormer,一种通过基元的线基表示来解决这些挑战的新方法。该设计保留了原始基元的几何连续性,能够实现更精确的形状表示,同时保持计算友好的结构,使其非常适合矢量图形理解任务。为进一步提升预测可靠性,我们引入了分支融合优化模块,该模块有效整合实例预测与语义预测,解决两者间的不一致性,从而生成更一致的全景输出。大量实验表明,我们的方法建立了新的最优性能,实现了91.1的PQ值,在有先验信息和无先验信息设置下,其Stuff-PQ分别比次优结果提升了9.6和21.2个点,这凸显了线基表示作为矢量图形理解基础的强大潜力。