Tracking geographic entities from historical maps, such as buildings, offers valuable insights into cultural heritage, urbanization patterns, environmental changes, and various historical research endeavors. However, linking these entities across diverse maps remains a persistent challenge for researchers. Traditionally, this has been addressed through a two-step process: detecting entities within individual maps and then associating them via a heuristic-based post-processing step. In this paper, we propose a novel approach that combines segmentation and association of geographic entities in historical maps using video instance segmentation (VIS). This method significantly streamlines geographic entity alignment and enhances automation. However, acquiring high-quality, video-format training data for VIS models is prohibitively expensive, especially for historical maps that often contain hundreds or thousands of geographic entities. To mitigate this challenge, we explore self-supervised learning (SSL) techniques to enhance VIS performance on historical maps. We evaluate the performance of VIS models under different pretraining configurations and introduce a novel method for generating synthetic videos from unlabeled historical map images for pretraining. Our proposed self-supervised VIS method substantially reduces the need for manual annotation. Experimental results demonstrate the superiority of the proposed self-supervised VIS approach, achieving a 24.9\% improvement in AP and a 0.23 increase in F1 score compared to the model trained from scratch.
翻译:从历史地图中追踪地理实体(如建筑物)可为文化遗产、城市化模式、环境变化及各类历史研究提供宝贵见解。然而,在不同地图间关联这些实体始终是研究者面临的持续挑战。传统方法通过两步流程解决:首先在单张地图中检测实体,随后通过基于启发式的后处理步骤进行关联。本文提出一种新颖方法,利用视频实例分割(VIS)技术,将历史地图中的地理实体分割与关联相结合。该方法显著简化了地理实体对齐流程并提升了自动化程度。然而,为VIS模型获取高质量的视频格式训练数据成本极高,尤其对于常包含数百乃至数千个地理实体的历史地图而言。为缓解这一挑战,我们探索了自监督学习(SSL)技术以提升VIS在历史地图上的性能。我们评估了VIS模型在不同预训练配置下的表现,并提出一种从无标注历史地图图像生成合成视频用于预训练的新方法。所提出的自监督VIS方法大幅降低了对人工标注的需求。实验结果表明,所提出的自监督VIS方法具有显著优势,与从头训练的模型相比,其AP指标提升了24.9%,F1分数提高了0.23。