Library migration is a common but error-prone task in software development. Developers may need to replace one library with another due to reasons like changing requirements or licensing changes. Migration typically entails updating and rewriting source code manually. While automated migration tools exist, most rely on mining examples from real-world projects that have already undergone similar migrations. However, these data are scarce, and collecting them for arbitrary pairs of libraries is difficult. Moreover, these migration tools often miss out on leveraging modern code transformation infrastructure. In this paper, we present a new approach to automated API migration that sidesteps the limitations described above. Instead of relying on existing migration data or using LLMs directly for transformation, we use LLMs to extract migration examples. Next, we use an Agent to generalize those examples to reusable transformation scripts in PolyglotPiranha, a modern code transformation tool. Our method distills latent migration knowledge from LLMs into structured, testable, and repeatable migration logic, without requiring preexisting corpora or manual engineering effort. Experimental results across Python libraries show that our system can generate diverse migration examples and synthesize transformation scripts that generalize to real-world codebases.
翻译:库迁移是软件开发中常见但易出错的任务。由于需求变更或许可证调整等原因,开发者可能需要将某个库替换为另一个库。迁移通常需要手动更新和重写源代码。虽然存在自动化迁移工具,但大多数依赖于从已完成类似迁移的实际项目中挖掘示例。然而,这类数据稀缺,且为任意库对收集数据十分困难。此外,现有迁移工具往往未能充分利用现代代码转换基础设施。本文提出一种新的自动化API迁移方法,规避了上述局限性。该方法不依赖现有迁移数据或直接使用大语言模型进行转换,而是利用大语言模型提取迁移示例,继而通过智能体将这些示例泛化为可复用的转换脚本(使用现代代码转换工具PolyglotPiranha实现)。我们的方法将大语言模型中的潜在迁移知识提炼为结构化、可测试且可重复的迁移逻辑,无需预先构建语料库或人工设计规则。在多个Python库上的实验结果表明,本系统能够生成多样化的迁移示例,并合成可泛化至实际代码库的转换脚本。