Styled Handwritten Text Generation (HTG) has received significant attention in recent years, propelled by the success of learning-based solutions employing GANs, Transformers, and, preliminarily, Diffusion Models. Despite this surge in interest, there remains a critical yet understudied aspect - the impact of the input, both visual and textual, on the HTG model training and its subsequent influence on performance. This study delves deeper into a cutting-edge Styled-HTG approach, proposing strategies for input preparation and training regularization that allow the model to achieve better performance and generalize better. These aspects are validated through extensive analysis on several different settings and datasets. Moreover, in this work, we go beyond performance optimization and address a significant hurdle in HTG research - the lack of a standardized evaluation protocol. In particular, we propose a standardization of the evaluation protocol for HTG and conduct a comprehensive benchmarking of existing approaches. By doing so, we aim to establish a foundation for fair and meaningful comparisons between HTG strategies, fostering progress in the field.
翻译:风格化手写文本生成(HTG)近年来受到广泛关注,其发展得益于基于学习的方法(如生成对抗网络、Transformer模型及初步应用的扩散模型)所取得的成功。尽管研究热度持续攀升,但一个关键且尚未充分探索的问题仍然存在——输入内容(包括视觉输入与文本输入)对HTG模型训练的影响及其对后续性能的调控机制。本研究深入探讨前沿风格化HTG方法,提出输入准备与训练正则化策略,使模型能够实现更优性能并提升泛化能力。通过多种设置与数据集上的广泛分析验证了这些策略的有效性。此外,本文不仅致力于性能优化,更着力解决HTG研究中的关键障碍——缺乏标准化评估协议。具体而言,我们提出HTG评估协议的标准化方案,并对现有方法进行系统基准测试。通过这一工作,旨在为不同HTG策略之间建立公平且有意义的比较基础,推动该领域的持续发展。