Programmable Logic Controllers are operated by proprietary code dialects; this makes it challenging to train coding assistants. Current LLMs are trained on large code datasets and are capable of writing IEC 61131-3 compatible code out of the box, but they neither know specific function blocks, nor related project code. Moreover, companies like Mitsubishi Electric and their customers do not trust cloud providers. Hence, an own coding agent is the desired solution to cope with this. In this study, we present our work on a low-data domain coding assistant solution for industrial use. We show how we achieved high quality code generation without fine-tuning large models and by fine-tuning small local models for edge device usage. Our tool lets several AI models compete with each other, uses reasoning, corrects bugs automatically and checks code validity by compiling it directly in the chat interface. We support our approach with an extensive evaluation that comes with code compilation statistics and user ratings. We found that a Retrieval-Augmented Generation (RAG) supported coding assistant can work in low-data domains by using extensive prompt engineering and directed retrieval.
翻译:可编程逻辑控制器采用专有代码方言进行操作,这为训练代码助手带来了挑战。当前的大语言模型基于大规模代码数据集训练,能够直接生成符合IEC 61131-3标准的代码,但它们既不熟悉特定功能块,也不了解相关项目代码。此外,三菱电机等公司及其客户对云服务提供商缺乏信任。因此,构建自主的代码生成智能体成为应对这一挑战的理想解决方案。本研究提出了一种面向工业应用的低数据领域代码助手解决方案。我们展示了如何在不微调大模型的情况下,通过微调适用于边缘设备的小型本地模型,实现高质量的代码生成。我们的工具使多个AI模型相互竞争,运用推理机制,自动修正错误,并通过在聊天界面直接编译来验证代码有效性。我们通过包含代码编译统计和用户评分的全面评估来验证该方法。研究发现,基于检索增强生成的代码助手能够通过精细的提示工程和定向检索,在低数据领域有效工作。