基于检索增强微调与偏好优化的视觉程序生成 (Retrieval-Augmented Fine-Tuning With Preference Optimization For Visual Program Generation)

Visual programming languages (VPLs) allow users to create programs through graphical interfaces, which results in easier accessibility and their widespread usage in various domains. To further enhance this accessibility, recent research has focused on generating VPL code from user instructions using large language models (LLMs). Specifically, by employing prompting-based methods, these studies have shown promising results. Nevertheless, such approaches can be less effective for industrial VPLs such as Ladder Diagram (LD). LD is a pivotal language used in industrial automation processes and involves extensive domain-specific configurations, which are difficult to capture in a single prompt. In this work, we demonstrate that training-based methods outperform prompting-based methods for LD generation accuracy, even with smaller backbone models. Building on these findings, we propose a two-stage training strategy to further enhance VPL generation. First, we employ retrieval-augmented fine-tuning to leverage the repetitive use of subroutines commonly seen in industrial VPLs. Second, we apply direct preference optimization (DPO) to further guide the model toward accurate outputs, using systematically generated preference pairs through graph editing operations. Extensive experiments on real-world LD data demonstrate that our approach improves program-level accuracy by over 10% compared to supervised fine-tuning, which highlights its potential to advance industrial automation.

翻译：视觉编程语言（VPLs）允许用户通过图形界面创建程序，这使其更易于访问，并在各个领域得到广泛应用。为了进一步提升这种可访问性，近期研究聚焦于利用大语言模型（LLMs）从用户指令生成VPL代码。具体而言，通过采用基于提示的方法，这些研究已展现出有希望的结果。然而，此类方法对于工业VPL（如梯形图（LD））可能效果欠佳。LD是工业自动化过程中使用的关键语言，涉及大量领域特定的配置，这些配置难以通过单一提示捕获。在本工作中，我们证明，即使使用较小的骨干模型，基于训练的方法在LD生成准确性上也优于基于提示的方法。基于这些发现，我们提出一种两阶段训练策略以进一步增强VPL生成。首先，我们采用检索增强微调，以利用工业VPL中常见的子程序重复使用特性。其次，我们应用直接偏好优化（DPO），通过图编辑操作系统生成偏好对，进一步引导模型产生准确输出。在真实世界LD数据上进行的大量实验表明，与监督微调相比，我们的方法将程序级准确性提升了超过10%，这突显了其在推动工业自动化方面的潜力。