Semi-Supervised Adaptation of Diffusion Models for Handwritten Text Generation

The generation of images of realistic looking, readable handwritten text is a challenging task which is referred to as handwritten text generation (HTG). Given a string and examples from a writer, the goal is to synthesize an image depicting the correctly spelled word in handwriting with the calligraphic style of the desired writer. An important application of HTG is the generation of training images in order to adapt downstream models for new data sets. With their success in natural image generation, diffusion models (DMs) have become the state-of-the-art approach in HTG. In this work, we present an extension of a latent DM for HTG to enable generation of writing styles not seen during training by learning style conditioning with a masked auto encoder. Our proposed content encoder allows for different ways of conditioning the DM on textual and calligraphic features. Additionally, we employ classifier-free guidance and explore the influence on the quality of the generated training images. For adapting the model to a new unlabeled data set, we propose a semi-supervised training scheme. We evaluate our approach on the IAM-database and use the RIMES-database to examine the generation of data not seen during training achieving improvements in this particularly promising application of DMs for HTG.

翻译：生成外观逼真、可读的手写文本图像是一项具有挑战性的任务，称为手写文本生成（HTG）。给定一个字符串和来自书写者的示例，其目标是合成一幅图像，以期望书写者的书法风格描绘正确拼写的单词手写体。HTG的一个重要应用是生成训练图像，以便使下游模型适应新的数据集。凭借在自然图像生成方面的成功，扩散模型（DMs）已成为HTG领域的最先进方法。在这项工作中，我们提出了一种用于HTG的潜在DM的扩展，通过学习使用掩码自编码器进行风格条件化，能够生成训练期间未见过的书写风格。我们提出的内容编码器允许以不同方式对DM进行文本和书法特征的条件化。此外，我们采用无分类器引导，并探讨其对生成的训练图像质量的影响。为了使模型适应新的未标记数据集，我们提出了一种半监督训练方案。我们在IAM数据库上评估了我们的方法，并使用RIMES数据库来检验对训练期间未见数据的生成，在此DMs用于HTG的特别有前景的应用中取得了改进。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日