Fault diagnostics and recovery in smart factories is challenging because critical information is dispersed across manuals of multiple machines which are interconnected through the manufacturing process. Large Language Models (LLMs) can provide a promising approach. In this paper, we propose FactoryLLM, a safe and open-source AI playground designed for evaluating different LLM-based retrieval-augmented generation (RAG) models by analysing documents from multiple machines across the manufacturing process. FactoryLLM enables the user to configure the LLM, and assess performance when reasoning over multiple documents, through a dual evaluation setup using both RAGAS and NVIDIA's LLM-as-a-Judge metrics. FactoryLLM is safe because it allows users to run local or open-source LLMs without sharing sensitive industrial data, providing a controlled environment for experimentation. We demonstrate the efficacy of FactoryLLM through a case study which involves an Autonomous Intelligent Vehicle and its Mobile Planner software, evaluating three LLMs across 30 maintenance queries derived from approximately 600 pages of cross-machine documentation. The results suggest that FactoryLLM is effective in cross-machine document reasoning: every model achieved a groundedness score above 0.88. The full code and documentation for community to test FactoryLLM with their manufacturing specific scenarios are publicly available.
翻译:智能工厂中的故障诊断与恢复极具挑战性,因为关键信息分散在通过制造流程相互关联的多个机器手册中。大语言模型(LLMs)为此提供了可行方案。本文提出FactoryLLM——一个安全开源的AI测试平台,旨在通过分析制造流程中多台机器的文档,评估基于检索增强生成(RAG)的不同LLM模型。该平台允许用户配置LLM,并通过RAGAS与NVIDIA的LLM-as-a-Judge双重评估机制,测试模型在多文档推理中的性能表现。FactoryLLM的安全性体现在其支持运行本地或开源LLM,无需共享敏感工业数据,为实验提供受控环境。我们通过涉及自主智能车辆及其移动规划器软件的案例研究验证了该平台的有效性:基于约600页跨机器文档生成的30个维护查询,对三种LLM进行了评估。结果表明FactoryLLM在跨机器文档推理中表现卓越——所有模型的基础事实得分均超过0.88。社区可在公开代码与文档支持下,针对特定制造场景测试FactoryLLM。