Developers heavily rely on Application Programming Interfaces (APIs) from libraries to build their software. As software evolves, developers may need to replace the used libraries with alternate libraries, a process known as library migration. Doing this manually can be tedious, time-consuming, and prone to errors. Automated migration techniques can help alleviate some of this burden. However, designing effective automated migration techniques requires understanding the types of code changes required to transform client code that used the old library to the new library. This paper contributes an empirical study that provides a holistic view of Python library migrations, both in terms of the code changes required in a migration and the typical development effort involved. We manually label 3,096 migration-related code changes in 335 Python library migrations from 311 client repositories spanning 141 library pairs from 35 domains. Based on our labeled data, we derive a taxonomy for describing migration-related code changes, PyMigTax. Leveraging PyMigTax and our labeled data, we investigate various characteristics of Python library migrations, such as the types of program elements and properties of API mappings, the combinations of types of migration-related code changes in a migration, and the typical development effort required for a migration. Our findings highlight various potential shortcomings of current library migration tools. For example, we find that 40% of library pairs have API mappings that involve non-function program elements, while most library migration techniques typically assume that function calls from the source library will map into (one or more) function calls from the target library. As an approximation for the development effort involved, we find that, on average, a developer needs to learn about 4 APIs and 2 API mappings to perform a migration, and ... (truncated)
翻译:开发者在构建软件时高度依赖库提供的应用程序接口(API)。随着软件持续演进,开发者可能需要替换所使用的库,这一过程被称为库迁移。手动执行此类迁移既繁琐耗时又易出错,而自动化迁移技术有助于减轻这一负担。然而,设计有效的自动化迁移技术需要理解将使用旧库的客户端代码转换为新库所需的代码变更类型。本文贡献了一项实证研究,从迁移所需的代码变更类型以及典型开发工作量两个维度,系统性地揭示了Python库迁移的全貌。我们手动标注了来自311个客户端仓库的335次Python库迁移中的3096项与迁移相关的代码变更,这些变更涉及跨越35个领域的141个库对。基于标注数据,我们构建了描述迁移相关代码变更的分类体系PyMigTax。借助PyMigTax及标注数据,我们深入探究了Python库迁移的多项特征,包括程序元素类型与API映射属性、单次迁移中代码变更类型的组合模式,以及迁移所需的典型开发工作量。研究结果揭示了当前库迁移工具的若干潜在缺陷。例如,我们发现40%的库对其API映射涉及非函数型程序元素,而多数迁移技术通常假设源库的函数调用会映射为目标库的一个或多个函数调用。作为开发工作量的近似估算,我们发现开发者平均需学习约4个API及2组API映射方可完成迁移,且...(截断)(注:原文abstract末尾已标记截断,此处保留省略符)