多模态制造安全聊天机器人：知识库设计、基准构建与多种RAG方法的评估 (A Multimodal Manufacturing Safety Chatbot: Knowledge Base Design, Benchmark Development, and Evaluation of Multiple RAG Approaches)

Ryan Singh,Austin Hamilton,Amanda White,Michael Wise,Ibrahim Yousif,Arthur Carvalho,Zhe Shan,Reza Abrisham Baf,Mohammad Mayyas,Lora A. Cavuoto,Fadel M. Megahed

from arxiv, 25 pages, 5 figures

Ensuring worker safety remains a critical challenge in modern manufacturing environments. Industry 5.0 reorients the prevailing manufacturing paradigm toward more human-centric operations. Using a design science research methodology, we identify three essential requirements for next-generation safety training systems: high accuracy, low latency, and low cost. We introduce a multimodal chatbot powered by large language models that meets these design requirements. The chatbot uses retrieval-augmented generation to ground its responses in curated regulatory and technical documentation. To evaluate our solution, we developed a domain-specific benchmark of expert-validated question and answer pairs for three representative machines: a Bridgeport manual mill, a Haas TL-1 CNC lathe, and a Universal Robots UR5e collaborative robot. We tested 24 RAG configurations using a full-factorial design and assessed them with automated evaluations of correctness, latency, and cost. Our top 2 configurations were then evaluated by ten industry experts and academic researchers. Our results show that retrieval strategy and model configuration have a significant impact on performance. The top configuration, selected for chatbot deployment, achieved an accuracy of 86.66%, an average cost of $0.005 per query, and an average end-to-end latency of 10.04 seconds. This latency is practical for delivering a complete safety instruction and is measured from query submission to full instruction delivery rather than generation onset. Overall, our work provides three contributions: an open-source, domain-grounded safety training chatbot; a validated benchmark for evaluating AI-assisted safety instruction; and a systematic methodology for designing and assessing AI-enabled instructional and immersive safety training systems for Industry 5.0 environments.

翻译：保障工人安全仍是现代制造环境中的关键挑战。工业5.0将主流制造范式重新定位为更加以人为本的运营模式。采用设计科学研究方法，我们确定了下一代安全培训系统的三个基本要求：高准确率、低延迟和低成本。我们提出了一种基于大语言模型驱动的多模态聊天机器人，满足这些设计要求。该聊天机器人采用检索增强生成技术，将其回答建立在精心整理的法规与技术文档基础上。为评估我们的解决方案，我们针对三种代表性机器（Bridgeport手动铣床、Haas TL-1数控车床和Universal Robots UR5e协作机器人）开发了经专家验证的领域特定问答对基准。通过全因子设计测试了24种RAG配置，并采用自动评估方法对其正确性、延迟和成本进行了评估。随后由十位行业专家和学术研究人员对我们排名前2的配置进行了评估。结果表明，检索策略和模型配置对性能有显著影响。为聊天机器人部署选出的最优配置实现了86.66%的准确率，每次查询平均成本0.005美元，平均端到端延迟10.04秒。该延迟对于完整安全指令的交付具有实际可行性，其测量范围是从查询提交到完整指令交付（而非生成开始）。总体而言，我们的工作提供了三方面贡献：一个开源的、领域扎根的安全培训聊天机器人；一个经过验证的用于评估AI辅助安全指令的基准；以及一套系统化的方法论，用于设计和评估面向工业5.0环境的AI赋能教学与沉浸式安全培训系统。