Structural Evaluation Metrics for SVG Generation via Leave-One-Out Analysis

Scalable Vector Graphics (SVG) represent visual content as structured, editable code. Each element (path, shape, or text node) can be individually inspected, transformed, or removed. This structural editability is a main motivation for SVG generation, yet prevailing evaluation protocols primarily reduce the output to a single similarity score against a reference image or input texts, measuring how faithfully the result reproduces an image or follows the instructions, but not how well it preserves the structural properties that make SVG valuable. In particular, existing metrics cannot determine which generated elements contribute positively to overall visual quality, how visual concepts map to specific parts of the code, or whether the generated output supports meaningful downstream editing. We introduce element-level leave-one-out (LOO) analysis, inspired by the classic jackknife estimator. The procedure renders the SVG with and without each element, measures the resulting visual change, and derives a suite of structural quality metrics. Despite its simplicity, the jackknife's capacity to decompose an aggregate statistic into per-sample contributions translates directly to this setting. From a single mechanism, we obtain: (1) quality scores per element through LOO scoring that enable zero-shot artifact detection; (2) concept-element attribution that maps each element to the visual concept it serves; and (3) four structural metrics, purity, coverage, compactness, and locality, that quantify SVG modularity from complementary perspectives. We validate these metrics on over 19,000 edits (5 types) across 5 generation systems and 3 complexity tiers.

翻译：可缩放矢量图形（SVG）以结构化、可编辑的代码形式表示视觉内容。每个元素（路径、形状或文本节点）均可单独检查、变换或删除。这种结构化可编辑性是生成 SVG 的主要动机，然而现有的评价协议主要将输出简化为针对参考图像或输入文本的单一相似度得分，衡量结果再现图像或遵循指令的忠实程度，但并未评估其保留使 SVG 具有价值的结构化属性的好坏程度。特别是，现有指标无法判断哪些生成元素对整体视觉质量有积极贡献、视觉概念如何映射到代码的特定部分，或者生成的输出是否支持有意义的后续编辑。受经典刀切估计量的启发，我们引入了元素级的留一（LOO）分析。该过程分别在有和无每个元素的情况下渲染 SVG，测量由此产生的视觉变化，并推导出一套结构化质量指标。尽管简单，但刀切法将聚合统计量分解为逐样本贡献的能力直接适用于这一场景。通过单一机制，我们获得：（1）通过 LOO 评分得到的每个元素的质量得分，从而实现零样本伪影检测；（2）概念-元素归因，将每个元素映射到其服务的视觉概念；（3）四个结构化指标——纯度、覆盖率、紧凑性和局部性——从互补角度量化 SVG 的模块化程度。我们在 5 个生成系统和 3 个复杂度级别上，对超过 19,000 个编辑（5 种类型）验证了这些指标。