One of the challenges of handwriting recognition is to transcribe a large number of vastly different writing styles. State-of-the-art approaches do not explicitly use information about the writer's style, which may be limiting overall accuracy due to various ambiguities. We explore models with writer-dependent parameters which take the writer's identity as an additional input. The proposed models can be trained on datasets with partitions likely written by a single author (e.g. single letter, diary, or chronicle). We propose a Writer Style Block (WSB), an adaptive instance normalization layer conditioned on learned embeddings of the partitions. We experimented with various placements and settings of WSB and contrastively pre-trained embeddings. We show that our approach outperforms a baseline with no WSB in a writer-dependent scenario and that it is possible to estimate embeddings for new writers. However, domain adaptation using simple finetuning in a writer-independent setting provides superior accuracy at a similar computational cost. The proposed approach should be further investigated in terms of training stability and embedding regularization to overcome such a baseline.
翻译:手写识别面临的一大挑战是对大量风格迥异的书写内容进行转录。现有主流方法并未显式利用书写者风格信息,这可能导致因各种歧异性而限制整体识别精度。本研究探索了引入与书写者相关的参数模型,将书写者身份作为额外输入特征。所提出的模型可通过由单一作者书写(如信件、日记或编年史)的独立数据集分区进行训练。我们提出了书写者风格模块(Writer Style Block, WSB),这是一种基于分区学习嵌入向量的自适应实例归一化层。通过对比实验,我们研究了WSB在不同位置与配置下的表现,并采用对比预训练方法优化嵌入向量。结果表明,在依赖书写者的场景下,本方法优于未使用WSB的基准模型,且能够为新书写者估算嵌入向量。然而,在独立于书写者的设置中,采用简单微调进行域适应的方法可在相近计算成本下获得更高精度。后续研究需针对训练稳定性与嵌入正则化进行深入探索,以突破该基准性能的限制。