Text plays a crucial role in the transmission of human civilization, and teaching machines to generate online handwritten text in various styles presents an interesting and significant challenge. However, most prior work has concentrated on generating individual Chinese fonts, leaving {complete text line generation largely unexplored}. In this paper, we identify that text lines can naturally be divided into two components: layout and glyphs. Based on this division, we designed a text line layout generator coupled with a diffusion-based stylized font synthesizer to address this challenge hierarchically. More concretely, the layout generator performs in-context-like learning based on the text content and the provided style references to generate positions for each glyph autoregressively. Meanwhile, the font synthesizer which consists of a character embedding dictionary, a multi-scale calligraphy style encoder, and a 1D U-Net based diffusion denoiser will generate each font on its position while imitating the calligraphy style extracted from the given style references. Qualitative and quantitative experiments on the CASIA-OLHWDB demonstrate that our method is capable of generating structurally correct and indistinguishable imitation samples.
翻译:文字在人类文明传承中扮演着关键角色,而教会机器以多样风格生成在线手写文本是一项既有趣又极具意义的挑战。然而,现有研究大多集中于生成单个中文字体,{完整的文本行生成在很大程度上尚未得到充分探索}。本文指出,文本行可自然地划分为两个组成部分:布局与字形。基于此划分,我们设计了一个文本行布局生成器,并结合一个基于扩散的风格化字体合成器,以分层方式应对这一挑战。具体而言,布局生成器基于文本内容及提供的风格参考,通过自回归方式生成每个字形的位置,其过程类似于上下文学习。同时,字体合成器包含字符嵌入词典、多尺度书法风格编码器以及基于一维U-Net的扩散去噪器,该合成器将在每个生成的位置上生成字体,同时模仿从给定风格参考中提取的书法风格。在CASIA-OLHWDB数据集上进行的定性与定量实验表明,我们的方法能够生成结构正确且难以区分的仿写样本。