Text plays a crucial role in the transmission of human civilization, and teaching machines to generate online handwritten text in various styles presents an interesting and significant challenge. However, most prior work has concentrated on generating individual Chinese fonts, leaving {complete text line generation largely unexplored}. In this paper, we identify that text lines can naturally be divided into two components: layout and glyphs. Based on this division, we designed a text line layout generator coupled with a diffusion-based stylized font synthesizer to address this challenge hierarchically. More concretely, the layout generator performs in-context-like learning based on the text content and the provided style references to generate positions for each glyph autoregressively. Meanwhile, the font synthesizer which consists of a character embedding dictionary, a multi-scale calligraphy style encoder, and a 1D U-Net based diffusion denoiser will generate each font on its position while imitating the calligraphy style extracted from the given style references. Qualitative and quantitative experiments on the CASIA-OLHWDB demonstrate that our method is capable of generating structurally correct and indistinguishable imitation samples.
翻译:文本在人类文明传承中扮演着关键角色,而教会机器以多样风格生成在线手写文本是一项兼具趣味性与重要性的挑战。然而,现有研究大多集中于生成单个中文字体,{完整的文本行生成任务在很大程度上尚未得到充分探索}。本文指出,文本行可自然地划分为两个组成部分:布局与字形。基于此划分,我们设计了一个文本行布局生成器,并结合一个基于扩散模型的风格化字体合成器,以分层方式应对这一挑战。具体而言,布局生成器基于文本内容及提供的风格参考,通过自回归方式生成每个字形的位置,其过程类似于上下文学习。同时,字体合成器——包含字符嵌入词典、多尺度书法风格编码器以及基于一维U-Net的扩散去噪器——将在每个生成的位置上合成字体,并模仿从给定风格参考中提取的书法风格。在CASIA-OLHWDB数据集上的定性与定量实验表明,本方法能够生成结构正确且难以区分的仿写样本。