This paper presents a comparative study of near-duplicate image detection techniques in a real-world use case scenario, where a document management company is commissioned to manually annotate a collection of scanned photographs. Detecting duplicate and near-duplicate photographs can reduce the time spent on manual annotation by archivists. This real use case differs from laboratory settings as the deployment dataset is available in advance, allowing the use of transductive learning. We propose a transductive learning approach that leverages state-of-the-art deep learning architectures such as convolutional neural networks (CNNs) and Vision Transformers (ViTs). Our approach involves pre-training a deep neural network on a large dataset and then fine-tuning the network on the unlabeled target collection with self-supervised learning. The results show that the proposed approach outperforms the baseline methods in the task of near-duplicate image detection in the UKBench and an in-house private dataset.
翻译:本文针对实际应用场景中的近重复图像检测技术进行了比较研究,该场景涉及某文档管理公司受托对扫描照片集进行人工标注。检测重复及近重复照片能够减少档案管理员人工标注的时间消耗。此实际应用场景与实验室设置不同,其部署数据集可预先获取,从而允许采用转导学习方法。我们提出一种转导学习方法,该方法利用卷积神经网络(CNN)和视觉变换器(ViT)等前沿深度学习架构。我们的方法首先在大型数据集上预训练深度神经网络,随后通过自监督学习在未标注的目标照片集上对网络进行微调。实验结果表明,在UKBench数据集及内部私有数据集上的近重复图像检测任务中,所提方法性能优于基线方法。