Handwritten Text Recognition (HTR) is a well-established research area. In contrast, Handwritten Text Generation (HTG) is an emerging field with significant potential. This task is challenging due to the variation in individual handwriting styles. A large and diverse dataset is required to generate realistic handwritten text. However, such datasets are difficult to collect and are not readily available. Bengali is the fifth most spoken language in the world. While several studies exist for languages such as English and Arabic, Bengali handwritten text generation has received little attention. To address this gap, we propose a method for generating Bengali handwritten words. We developed and used a self-collected dataset of Bengali handwriting samples. The dataset includes contributions from approximately five hundred individuals across different ages and genders. All images were pre-processed to ensure consistency and quality. Our approach demonstrates the ability to produce diverse handwritten outputs from input plain text. We believe this work contributes to the advancement of Bengali handwriting generation and can support further research in this area.
翻译:手写文本识别(HTR)是一个成熟的研究领域。相比之下,手写文本生成(HTG)是一个具有巨大潜力的新兴领域。由于个体笔迹风格的多样性,这项任务极具挑战性。生成逼真的手写文本需要一个庞大且多样化的数据集。然而,此类数据集难以收集且不易获得。孟加拉语是世界上使用人数第五多的语言。尽管针对英语和阿拉伯语等语言已有若干研究,但孟加拉语手写文本生成却鲜有关注。为填补这一空白,我们提出了一种生成孟加拉语手写单词的方法。我们开发并使用了一个自行收集的孟加拉语笔迹样本数据集。该数据集包含约五百名不同年龄和性别的个体贡献的样本。所有图像均经过预处理以确保一致性和质量。我们的方法展示了从输入纯文本生成多样化手写输出的能力。我们相信这项工作有助于推动孟加拉语手写生成的发展,并能为该领域的进一步研究提供支持。