The generation of images of realistic looking, readable handwritten text is a challenging task which is referred to as handwritten text generation (HTG). Given a string and examples from a writer, the goal is to synthesize an image depicting the correctly spelled word in handwriting with the calligraphic style of the desired writer. An important application of HTG is the generation of training images in order to adapt downstream models for new data sets. With their success in natural image generation, diffusion models (DMs) have become the state-of-the-art approach in HTG. In this work, we present an extension of a latent DM for HTG to enable generation of writing styles not seen during training by learning style conditioning with a masked auto encoder. Our proposed content encoder allows for different ways of conditioning the DM on textual and calligraphic features. Additionally, we employ classifier-free guidance and explore the influence on the quality of the generated training images. For adapting the model to a new unlabeled data set, we propose a semi-supervised training scheme. We evaluate our approach on the IAM-database and use the RIMES-database to examine the generation of data not seen during training achieving improvements in this particularly promising application of DMs for HTG.
翻译:生成外观逼真、可读的手写文本图像是一项具有挑战性的任务,称为手写文本生成(HTG)。给定一个字符串和来自书写者的示例,其目标是合成一幅图像,以期望书写者的书法风格描绘正确拼写的单词手写体。HTG的一个重要应用是生成训练图像,以便使下游模型适应新的数据集。凭借在自然图像生成方面的成功,扩散模型(DMs)已成为HTG领域的最先进方法。在这项工作中,我们提出了一种用于HTG的潜在DM的扩展,通过学习使用掩码自编码器进行风格条件化,能够生成训练期间未见过的书写风格。我们提出的内容编码器允许以不同方式对DM进行文本和书法特征的条件化。此外,我们采用无分类器引导,并探讨其对生成的训练图像质量的影响。为了使模型适应新的未标记数据集,我们提出了一种半监督训练方案。我们在IAM数据库上评估了我们的方法,并使用RIMES数据库来检验对训练期间未见数据的生成,在此DMs用于HTG的特别有前景的应用中取得了改进。