We investigate the challenges of style transfer in multi-modal visual narratives. Among static visual narratives such as comics and manga, there are distinct visual styles in terms of presentation. They include style features across multiple dimensions, such as panel layout, size, shape, and color. They include both visual and text media elements. The layout of both text and media elements is also significant in terms of narrative communication. The sequential transitions between panels are where readers make inferences about the narrative world. These feature differences provide an interesting challenge for style transfer in which there are distinctions between the processing of features for each modality. We introduce the notion of comprehension-preserving style transfer (CPST) in such multi-modal domains. CPST requires not only traditional metrics of style transfer but also metrics of narrative comprehension. To spur further research in this area, we present an annotated dataset of comics and manga and an initial set of algorithms that utilize separate style transfer modules for the visual, textual, and layout parameters. To test whether the style transfer preserves narrative semantics, we evaluate this algorithm through visual story cloze tests inspired by work in computational cognition of narrative systems. Understanding the connection between style and narrative semantics provides insight for applications ranging from informational brochure designs to data storytelling.
翻译:本文研究了多模态视觉叙事中的风格迁移挑战。在漫画、连环画等静态视觉叙事中,存在基于呈现方式的独特视觉风格,涵盖面板布局、尺寸、形状和色彩等多维度风格特征,并融合了视觉与文本媒介元素。文本与媒介元素的布局对叙事传播同样具有关键意义,而面板间的序列过渡则是读者推断叙事世界的重要环节。这些特征差异为风格迁移带来了有趣的挑战——不同模态特征的处理存在显著区别。我们提出了多模态领域中"理解保持型风格迁移"(CPST)的概念,该迁移不仅需要传统风格迁移指标,更需要叙事理解评估指标。为促进该领域研究,我们构建了漫画数据集,并开发了基于视觉、文本和布局参数分离式风格迁移模块的初始算法。受叙事系统计算认知研究的启发,我们通过视觉故事完形填空测试评估该算法是否保持叙事语义。理解风格与叙事语义之间的关联,将为从信息手册设计到数据故事等应用场景提供重要启示。