Generating synthetic images of handwritten text in a writer-specific style is a challenging task, especially in the case of unseen styles and new words, and even more when these latter contain characters that are rarely encountered during training. While emulating a writer's style has been recently addressed by generative models, the generalization towards rare characters has been disregarded. In this work, we devise a Transformer-based model for Few-Shot styled handwritten text generation and focus on obtaining a robust and informative representation of both the text and the style. In particular, we propose a novel representation of the textual content as a sequence of dense vectors obtained from images of symbols written as standard GNU Unifont glyphs, which can be considered their visual archetypes. This strategy is more suitable for generating characters that, despite having been seen rarely during training, possibly share visual details with the frequently observed ones. As for the style, we obtain a robust representation of unseen writers' calligraphy by exploiting specific pre-training on a large synthetic dataset. Quantitative and qualitative results demonstrate the effectiveness of our proposal in generating words in unseen styles and with rare characters more faithfully than existing approaches relying on independent one-hot encodings of the characters.
翻译:生成具有特定书写者风格的合成手写文本图像是一项具有挑战性的任务,尤其是在面对未见过的风格和新词时,更甚的是当这些新词包含训练中极少出现的字符时。虽然近期生成模型已着手解决模仿书写者风格的问题,但针对稀有字符的泛化能力仍被忽视。本研究设计了一种基于Transformer的模型,用于少样本风格化手写文本生成,致力于获取文本与风格的鲁棒且具有信息量的表征。具体而言,我们提出了一种新颖的文本内容表征方法:将文本内容表示为从标准GNU Unifont字形(可视为字符的视觉原型)符号图像中提取的稠密向量序列。这种策略更有利于生成那些尽管在训练中很少出现,但其视觉细节可能频繁出现字符共享的字符。在风格表征方面,我们通过在大规模合成数据集上进行特定预训练,获得了对未见书写者笔迹的鲁棒表征。定量与定性结果表明,与依赖字符独立独热编码的现有方法相比,本方案在生成未见风格及包含稀有字符的单词方面具有更优的真实性。