Multimodal Large Language Models (MLLMs) show promise for general industrial quality inspection, but fall short in complex scenarios, such as Printed Circuit Board (PCB) inspection. PCB inspection poses unique challenges due to densely packed components, complex wiring structures, and subtle defect patterns that require specialized domain expertise. However, a high-quality, unified vision-language benchmark for quantitatively evaluating MLLMs across PCB inspection tasks remains absent, stemming not only from limited data availability but also from fragmented datasets and inconsistent standardization. To fill this gap, we propose UniPCB, the first unified vision-language benchmark for open-ended PCB quality inspection. UniPCB is built via a systematic pipeline that curates and standardizes data from disparate sources across three annotated scenarios. Furthermore, we introduce PCB-GPT, an MLLM trained on a new instruction dataset generated by this pipeline, utilizing a novel progressive curriculum that mimics the learning process of human experts. Evaluations on the UniPCB benchmark show that while existing MLLMs falter on domain-specific tasks, PCB-GPT establishes a new baseline. Notably, it more than doubles the performance on fine-grained defect localization compared to the strongest competitors, with significant advantages in localization and analysis. We will release the instruction data, benchmark, and model to facilitate future research.
翻译:多模态大语言模型(MLLMs)在通用工业质量检测方面展现出潜力,但在复杂场景(如印刷电路板(PCB)检测)中表现不足。PCB检测因其密集分布的元件、复杂的布线结构以及需要专业领域知识的细微缺陷模式而带来独特挑战。然而,目前仍缺乏一个高质量、统一的视觉语言基准来定量评估MLLMs在PCB检测任务上的表现,这不仅源于数据可用性有限,还因为数据集分散且标准化不一致。为填补这一空白,我们提出了UniPCB,首个用于开放式PCB质量检测的统一视觉语言基准。UniPCB通过一个系统化流程构建,该流程从三个标注场景的不同来源中整理并标准化数据。此外,我们引入了PCB-GPT,这是一个基于该流程生成的新指令数据集训练的MLLM,它采用了一种新颖的渐进式课程学习策略,模拟了人类专家的学习过程。在UniPCB基准上的评估表明,虽然现有的MLLMs在领域特定任务上表现不佳,但PCB-GPT建立了一个新的基线。值得注意的是,与最强的竞争对手相比,其在细粒度缺陷定位上的性能提升了一倍以上,在定位和分析方面具有显著优势。我们将发布指令数据、基准和模型,以促进未来研究。