Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs

Scripting interfaces enable users to automate tasks and customize software workflows, but creating scripts traditionally requires programming expertise and familiarity with specific APIs, posing barriers for many users. While Large Language Models (LLMs) can generate code from natural language queries, runtime code generation is severely limited due to unverified code, security risks, longer response times, and higher computational costs. To bridge the gap, we propose an offline simulation framework to curate a software-specific skillset, a collection of verified scripts, by exploiting LLMs and publicly available scripting guides. Our framework comprises two components: (1) task creation, using top-down functionality guidance and bottom-up API synergy exploration to generate helpful tasks; and (2) skill generation with trials, refining and validating scripts based on execution feedback. To efficiently navigate the extensive API landscape, we introduce a Graph Neural Network (GNN)-based link prediction model to capture API synergy, enabling the generation of skills involving underutilized APIs and expanding the skillset's diversity. Experiments with Adobe Illustrator demonstrate that our framework significantly improves automation success rates, reduces response time, and saves runtime token costs compared to traditional runtime code generation. This is the first attempt to use software scripting interfaces as a testbed for LLM-based systems, highlighting the advantages of leveraging execution feedback in a controlled environment and offering valuable insights into aligning AI capabilities with user needs in specialized software domains.

翻译：脚本接口使用户能够自动化任务并定制软件工作流，但传统上创建脚本需要编程专业知识和对特定API的熟悉度，这对许多用户构成了障碍。虽然大型语言模型（LLMs）能够根据自然语言查询生成代码，但由于未经验证的代码、安全风险、较长的响应时间以及较高的计算成本，运行时代码生成受到严重限制。为弥合这一差距，我们提出了一种离线仿真框架，通过利用LLMs和公开可用的脚本指南，来构建一个软件特定的技能集——即一组经过验证的脚本。我们的框架包含两个组成部分：(1) 任务创建，采用自上而下的功能指导和自下而上的API协同探索来生成有用的任务；(2) 带尝试的技能生成，基于执行反馈来优化和验证脚本。为了在广阔的API领域中高效导航，我们引入了一种基于图神经网络（GNN）的链接预测模型来捕捉API协同效应，从而能够生成涉及未充分利用API的技能，并扩展技能集的多样性。在Adobe Illustrator上的实验表明，与传统运行时代码生成相比，我们的框架显著提高了自动化成功率，减少了响应时间，并节省了运行时令牌成本。这是首次尝试将软件脚本接口作为基于LLM系统的测试平台，突显了在受控环境中利用执行反馈的优势，并为在专业软件领域中将AI能力与用户需求对齐提供了宝贵的见解。