Library migration, which re-implements the same software behavior by using a different library instead of using the current one, has been widely observed in software evolution. One essential part of library migration is to find an analogical API that could provide the same functionality as current ones. However, given the large number of libraries/APIs, manually finding an analogical API could be very time-consuming and error-prone. Researchers have developed multiple automated analogical API recommendation techniques. Documentation-based methods have particularly attracted significant interest. Despite their potential, these methods have limitations, such as a lack of comprehensive semantic understanding in documentation and scalability challenges. In this work, we propose KGE4AR, a novel documentation-based approach that leverages knowledge graph (KG) embedding to recommend analogical APIs during library migration. Specifically, KGE4AR proposes a novel unified API KG to comprehensively and structurally represent three types of knowledge in documentation, which can better capture the high-level semantics. Moreover, KGE4AR then proposes to embed the unified API KG into vectors, enabling more effective and scalable similarity calculation. We build KGE4AR' s unified API KG for 35,773 Java libraries and assess it in two API recommendation scenarios: with and without target libraries. Our results show that KGE4AR substantially outperforms state-of-the-art documentation-based techniques in both evaluation scenarios in terms of all metrics (e.g., 47.1%-143.0% and 11.7%-80.6% MRR improvements in each scenario). Additionally, we explore KGE4AR' s scalability, confirming its effective scaling with the growing number of libraries.
翻译:库迁移是指在软件演化过程中,通过使用不同库替代当前库来重新实现相同软件行为的过程。库迁移的关键环节之一是寻找能够提供与当前API相同功能的类比API。然而,由于库/API数量庞大,人工寻找类比API既耗时又易出错。研究人员已开发多种自动化类比API推荐技术,其中基于文档的方法尤其引起广泛关注。尽管这类方法潜力巨大,但仍存在文档语义理解不全面和可扩展性不足等局限性。本文提出KGE4AR——一种基于知识图谱(KG)嵌入的新型文档分析方法,用于在库迁移过程中推荐类比API。具体而言,KGE4AR首先构建统一API知识图谱,以结构化方式全面表征文档中的三类知识,能更精准捕获高层语义。随后通过嵌入该知识图谱生成向量表示,实现更高效且可扩展的相似度计算。我们为35,773个Java库构建了KGE4AR统一API知识图谱,并在两种API推荐场景(指定目标库与未指定目标库)中评估其性能。实验结果表明,KGE4AR在所有评估指标上均显著超越现有最优的基于文档技术(例如,两个场景中MRR分别提升47.1%-143.0%和11.7%-80.6%)。此外,我们验证了KGE4AR的可扩展性,证实其能随库数量增长有效扩展。