Despite recent significant advancements in Handwritten Document Recognition (HDR), the efficient and accurate recognition of text against complex backgrounds, diverse handwriting styles, and varying document layouts remains a practical challenge. Moreover, this issue is seldom addressed in academic research, particularly in scenarios with minimal annotated data available. In this paper, we introduce the DocTTT framework to address these challenges. The key innovation of our approach is that it uses test-time training to adapt the model to each specific input during testing. We propose a novel Meta-Auxiliary learning approach that combines Meta-learning and self-supervised Masked Autoencoder~(MAE). During testing, we adapt the visual representation parameters using a self-supervised MAE loss. During training, we learn the model parameters using a meta-learning framework, so that the model parameters are learned to adapt to a new input effectively. Experimental results show that our proposed method significantly outperforms existing state-of-the-art approaches on benchmark datasets.
翻译:尽管手写文档识别领域近期取得了显著进展,但在复杂背景、多样笔迹风格及多变文档布局下实现高效准确的文本识别仍面临实际挑战。该问题在学术界鲜有深入探讨,特别是在标注数据极为有限的场景中。本文提出DocTTT框架以应对这些挑战。本方法的核心创新在于利用测试时训练机制,使模型在测试阶段能够针对每个具体输入进行自适应调整。我们提出一种新颖的元辅助学习方法,该方法融合元学习与自监督掩码自编码器。在测试阶段,我们通过自监督MAE损失函数自适应调整视觉表征参数;在训练阶段,我们采用元学习框架优化模型参数,使模型能够高效适应新输入数据。实验结果表明,所提方法在基准数据集上显著优于现有最先进方法。