The success of denoising diffusion models in representing rich data distributions over 2D raster images has prompted research on extending them to other data representations, such as vector graphics. Unfortunately due to their variable structure and scarcity of vector training data, directly applying diffusion models on this domain remains a challenging problem. Using workarounds like optimization via Score Distillation Sampling (SDS) is also fraught with difficulty, as vector representations are non trivial to directly optimize and tend to result in implausible geometries such as redundant or self-intersecting shapes. NIVeL addresses these challenges by reinterpreting the problem on an alternative, intermediate domain which preserves the desirable properties of vector graphics -- mainly sparsity of representation and resolution-independence. This alternative domain is based on neural implicit fields expressed in a set of decomposable, editable layers. Based on our experiments, NIVeL produces text-to-vector graphics results of significantly better quality than the state-of-the-art.
翻译:去噪扩散模型在表示二维栅格图像上的丰富数据分布方面取得的成功,促使了将其扩展到其他数据表示形式(如矢量图形)的研究。然而,由于矢量图形结构可变且训练数据稀缺,直接在该领域应用扩散模型仍然是一个具有挑战性的问题。使用诸如通过分数蒸馏采样(SDS)进行优化等变通方法也充满困难,因为矢量表示难以直接优化,并且往往会产生不合理的几何形状,例如冗余或自相交的形状。NIVeL通过在一个替代的中间域上重新解释该问题来应对这些挑战,该域保留了矢量图形的理想特性——主要是表示的稀疏性和分辨率无关性。这个替代域基于一组可分解、可编辑的层中表达的神经隐式场。根据我们的实验,NIVeL生成的文本到矢量图形结果在质量上显著优于现有技术。