Traditional statistical graphics are precise but often lack the visual appeal, memorability, and engagement of pictorial charts. We present a generative framework for the automated synthesis of pictorial charts that bridges the gap between semantic expression and structural faithfulness. Rather than treating charts merely as images to be stylized, we frame the problem as a dual-conditioned generation task guided by two parallel external control signals: a text prompt capturing the semantic context of the editing intent, and a context image providing the abstract statistical chart's global structure. To reinforce these controls within a Multi-Modal Diffusion Transformer, we introduce two complementary feature-level mechanisms: structural alignment to anchor spatial layouts to the input chart, and semantic alignment to transfer expressive textures from reference images. Generalizing across major visual channels (i.e., length, area, angle, and position) and diverse semantic domains, our method produces pictorial charts that are both artistically compelling and structurally consistent. Extensive quantitative evaluations and perceptual user studies demonstrate that our framework outperforms traditional controllable generation and image editing baselines, providing a foundation for high-fidelity, data-driven generative modeling in expressive visual storytelling. Project page: https://ssalign.github.io/.
翻译:传统统计图表精确性强,但在视觉吸引力、记忆点和用户参与度上往往不及可视化图表。我们提出一种自动化合成可视化图表的生成框架,旨在弥合语义表达与结构保真度之间的鸿沟。不同于将图表简单地视为需要风格化的图像,我们将此问题界定为一个双条件生成任务——受两组并行外部控制信号引导:一是捕捉编辑意图语义上下文的文本提示,二是提供抽象统计图表全局结构的语境图像。为在多模态扩散Transformer中强化这些控制信号,我们引入两种互补的特征级机制:结构对齐用于锚定输入图表的空间布局,语义对齐用于从参考图像迁移表现性纹理。该方法可泛化至长度、面积、角度和位置等主要视觉通道,并覆盖多样化语义领域,生成的可视化图表兼具艺术感染力与结构一致性。通过大量定量评估与感知用户研究,我们的框架在传统可控生成与图像编辑基线方法中表现更优,为高保真度、数据驱动的生成式建模在表现性视觉叙事中的应用奠定了基础。项目页面:https://ssalign.github.io/。