Historical map legends are critical for interpreting cartographic symbols. However, their inconsistent layouts and unstructured formats make automatic extraction challenging. Prior work focuses primarily on segmentation or general optical character recognition (OCR), with few methods effectively matching legend symbols to their corresponding descriptions in a structured manner. We present a method that combines LayoutLMv3 for layout detection with GPT-4o using in-context learning to detect and link legend items and their descriptions via bounding box predictions. Our experiments show that GPT-4 with structured JSON prompts outperforms the baseline, achieving 88% F-1 and 85% IoU, and reveal how prompt design, example counts, and layout alignment affect performance. This approach supports scalable, layout-aware legend parsing and improves the indexing and searchability of historical maps across various visual styles.
翻译:历史地图图例对于解读制图符号至关重要。然而,其布局不一致且格式非结构化的特点使得自动提取具有挑战性。先前的研究主要集中于分割或通用光学字符识别(OCR),鲜有方法能以结构化方式有效匹配图例符号及其对应描述。本文提出一种方法,将用于布局检测的LayoutLMv3与基于上下文学习的GPT-4o相结合,通过边界框预测来检测并关联图例项及其描述。实验表明,采用结构化JSON提示的GPT-4o优于基线方法,实现了88%的F-1分数和85%的交并比(IoU),并揭示了提示设计、示例数量及布局对齐对性能的影响。该方法支持可扩展的、具有布局感知能力的图例解析,并提升了不同视觉风格历史地图的索引与可检索性。