Decades of engineering drawings and technical records remain locked in legacy archives with inconsistent or missing metadata, making retrieval difficult and often manual. We present Blueprint, a layout-aware multimodal retrieval system designed for large-scale engineering repositories. Blueprint detects canonical drawing regions, applies region-restricted VLM-based OCR, normalizes identifiers (e.g., DWG, part, facility), and fuses lexical and dense retrieval with a lightweight region-level reranker. Deployed on ~770k unlabeled files, it automatically produces structured metadata suitable for cross-facility search. We evaluate Blueprint on a 5k-file benchmark with 350 expert-curated queries using pooled, graded (0/1/2) relevance judgments. Blueprint delivers a 10.1% absolute gain in Success@3 and an 18.9% relative improvement in nDCG@3 over the strongest vision-language baseline}, consistently outperforming across vision, text, and multimodal intents. Oracle ablations reveal substantial headroom under perfect region detection and OCR. We release all queries, runs, annotations, and code to facilitate reproducible evaluation on legacy engineering archives.
翻译:数十年的工程图纸与技术记录仍封存于遗留档案中,其元数据不一致或缺失,导致检索困难且常需人工操作。本文提出Blueprint,一种专为大规模工程资料库设计的布局感知多模态检索系统。该系统能够检测标准图纸区域,应用基于区域限制的视觉语言模型OCR技术,规范化标识符(如DWG编号、零件号、设施代码),并通过轻量级区域级重排序器融合词法与稠密检索。在约77万份未标注文件上部署后,系统能自动生成适用于跨设施搜索的结构化元数据。我们在包含350个专家精心设计查询的5千文件基准集上,采用池化分级(0/1/2)相关性评估方法对Blueprint进行评测。相较于最强的视觉语言基线,Blueprint在Success@3指标上实现10.1%的绝对提升,在nDCG@3指标上获得18.9%的相对改进,且在视觉、文本及多模态检索意图上均表现优异。通过理想区域检测与OCR的模拟消融实验,揭示了系统存在显著的性能提升空间。我们公开全部查询集、运行结果、标注数据及代码,以促进遗留工程档案的可复现评估研究。