Open Source Software (OSS) is forming the spines of technology infrastructures, attracting millions of talents to contribute. Notably, it is challenging and critical to consider both the developers' interests and the semantic features of the project code to recommend appropriate development tasks to OSS developers. In this paper, we formulate the novel problem of code recommendation, whose purpose is to predict the future contribution behaviors of developers given their interaction history, the semantic features of source code, and the hierarchical file structures of projects. Considering the complex interactions among multiple parties within the system, we propose CODER, a novel graph-based code recommendation framework for open source software developers. CODER jointly models microscopic user-code interactions and macroscopic user-project interactions via a heterogeneous graph and further bridges the two levels of information through aggregation on file-structure graphs that reflect the project hierarchy. Moreover, due to the lack of reliable benchmarks, we construct three large-scale datasets to facilitate future research in this direction. Extensive experiments show that our CODER framework achieves superior performance under various experimental settings, including intra-project, cross-project, and cold-start recommendation. We will release all the datasets, code, and utilities for data retrieval upon the acceptance of this work.
翻译:开源软件(OSS)正构成技术基础设施的支柱,吸引了数百万人才为其做出贡献。值得注意的是,在向OSS开发者推荐合适的开发任务时,兼顾开发者兴趣与项目代码的语义特征具有挑战性且至关重要。本文提出了代码推荐这一新问题,其目标是根据开发者的交互历史、源代码的语义特征以及项目的层次化文件结构,预测开发者未来的贡献行为。考虑到系统中多方参与者间的复杂交互,我们提出CODER——一种面向开源软件开发者的新型基于图的代码推荐框架。CODER通过异构图联合建模微观的用户-代码交互与宏观的用户-项目交互,并进一步通过反映项目层次结构的文件结构图上的聚合操作,桥接这两个层次的信息。此外,由于缺乏可靠的基准数据集,我们构建了三个大规模数据集以促进该方向的未来研究。大量实验表明,我们的CODER框架在多种实验设置下(包括项目内、跨项目和冷启动推荐)均取得了优越性能。本工作被接收后,我们将公开全部数据集、代码及数据检索工具。