Font generation is a difficult and time-consuming task, especially in those languages using ideograms that have complicated structures with a large number of characters, such as Chinese. To solve this problem, few-shot font generation and even one-shot font generation have attracted a lot of attention. However, most existing font generation methods may still suffer from (i) large cross-font gap challenge; (ii) subtle cross-font variation problem; and (iii) incorrect generation of complicated characters. In this paper, we propose a novel one-shot font generation method based on a diffusion model, named Diff-Font, which can be stably trained on large datasets. The proposed model aims to generate the entire font library by giving only one sample as the reference. Specifically, a large stroke-wise dataset is constructed, and a stroke-wise diffusion model is proposed to preserve the structure and the completion of each generated character. To our best knowledge, the proposed Diff-Font is the first work that developed diffusion models to handle the font generation task. The well-trained Diff-Font is not only robust to font gap and font variation, but also achieved promising performance on difficult character generation. Compared to previous font generation methods, our model reaches state-of-the-art performance both qualitatively and quantitatively.
翻译:字体生成是一项困难且耗时的任务,尤其是在使用表意文字的语言中,这类文字结构复杂且字符数量庞大(如中文)。为解决这一问题,少样本甚至单样本字体生成技术已引起广泛关注。然而,现有大多数字体生成方法仍面临以下挑战:(i) 跨字体大差距问题;(ii) 跨字体细微差异问题;(iii) 复杂字符的生成错误。本文提出了一种基于扩散模型的单样本字体生成新方法——Diff-Font,该方法可在大规模数据集上稳定训练。所提模型旨在仅通过单个参考样本生成完整字体库。具体而言,我们构建了大规模笔画级数据集,并提出笔画级扩散模型以保留生成字符的结构完整性与完成度。据我们所知,Diff-Font是首个将扩散模型应用于字体生成任务的工作。经充分训练的Diff-Font不仅对字体差距与字体变体具有鲁棒性,更在困难字符生成中取得了优异性能。与现有字体生成方法相比,本模型在定性与定量评估中均达到最先进水平。