SAGHOG: Self-Supervised Autoencoder for Generating HOG Features for Writer Retrieval

This paper introduces SAGHOG, a self-supervised pretraining strategy for writer retrieval using HOG features of the binarized input image. Our preprocessing involves the application of the Segment Anything technique to extract handwriting from various datasets, ending up with about 24k documents, followed by training a vision transformer on reconstructing masked patches of the handwriting. SAGHOG is then finetuned by appending NetRVLAD as an encoding layer to the pretrained encoder. Evaluation of our approach on three historical datasets, Historical-WI, HisFrag20, and GRK-Papyri, demonstrates the effectiveness of SAGHOG for writer retrieval. Additionally, we provide ablation studies on our architecture and evaluate un- and supervised finetuning. Notably, on HisFrag20, SAGHOG outperforms related work with a mAP of 57.2 % - a margin of 11.6 % to the current state of the art, showcasing its robustness on challenging data, and is competitive on even small datasets, e.g. GRK-Papyri, where we achieve a Top-1 accuracy of 58.0%.

翻译：本文提出SAGHOG——一种利用二值化输入图像的HOG特征进行作者检索的自监督预训练策略。预处理阶段应用Segment Anything技术从多个数据集中提取手写内容，最终获得约2.4万份文档，随后训练视觉Transformer对掩码手写图像块进行重建。通过向预训练编码器附加NetRVLAD编码层，对SAGHOG进行微调。在Historical-WI、HisFrag20和GRK-Papyri三个历史数据集上的评估表明，SAGHOG在作者检索任务中的有效性。此外，我们进行了架构消融研究，并评估了无监督与有监督微调效果。值得注意的是，在HisFrag20数据集上，SAGHOG以57.2%的平均精度（mAP）超越现有相关工作——较当前最优方法高出11.6%，展示了其在挑战性数据上的鲁棒性；同时即便在GRK-Papyri等小型数据集上仍具竞争力，Top-1准确率达58.0%。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

【CVPR 2022】一种无需使用负样本的自监督学习方法，Self-Supervised Predictive Learning: A Negative-Free Method for Sound Source Localization in Visual Scenes

专知会员服务

15+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日