A Digital Twin (DT) replicates objects, processes, or systems for real-time monitoring, simulation, and predictive maintenance. Recent advancements like Large Language Models (LLMs) have revolutionized traditional AI systems and offer immense potential when combined with DT in industrial applications such as railway defect inspection. Traditionally, this inspection requires extensive defect samples to identify patterns, but limited samples can lead to overfitting and poor performance on unseen defects. Integrating pre-trained LLMs into DT addresses this challenge by reducing the need for vast sample data. We introduce DefectTwin, which employs a multimodal and multi-model (M^2) LLM-based AI pipeline to analyze both seen and unseen visual defects in railways. This application enables a railway agent to perform expert-level defect analysis using consumer electronics (e.g., tablets). A multimodal processor ensures responses are in a consumable format, while an instant user feedback mechanism (instaUF) enhances Quality-of-Experience (QoE). The proposed M^2 LLM outperforms existing models, achieving high precision (0.76-0.93) across multimodal inputs including text, images, and videos of pre-trained defects, and demonstrates superior zero-shot generalizability for unseen defects. We also evaluate the latency, token count, and usefulness of responses generated by DefectTwin on consumer devices. To our knowledge, DefectTwin is the first LLM-integrated DT designed for railway defect inspection.
翻译:数字孪生(DT)通过复制物体、过程或系统,实现实时监控、仿真与预测性维护。近年来,大语言模型(LLM)等技术的进步彻底改变了传统人工智能系统,并在与数字孪生结合应用于铁路缺陷检测等工业场景时展现出巨大潜力。传统检测方法通常需要大量缺陷样本以识别模式,但样本有限易导致过拟合及对未见缺陷的识别性能下降。将预训练的大语言模型集成到数字孪生中,可显著减少对海量样本数据的依赖,从而应对这一挑战。本文提出DefectTwin,它采用基于多模态多模型(M²)大语言模型的人工智能流程,用于分析铁路中已见及未见的视觉缺陷。该应用使得铁路巡检人员能够借助消费电子设备(如平板电脑)执行专家级的缺陷分析。多模态处理器确保输出响应为可便捷使用的格式,而即时用户反馈机制(instaUF)则提升了体验质量(QoE)。所提出的M²大语言模型在性能上优于现有模型,在文本、图像及预训练缺陷视频等多模态输入上均实现了高精度(0.76–0.93),并对未见缺陷表现出卓越的零样本泛化能力。我们同时评估了DefectTwin在消费级设备上生成响应的延迟、令牌数量及实用性。据我们所知,DefectTwin是首个专为铁路缺陷检测设计的、集成大语言模型的数字孪生系统。