Labeling issues with the skills required to complete them can help contributors to choose tasks in Open Source Software projects. However, manually labeling issues is time-consuming and error-prone, and current automated approaches are mostly limited to classifying issues as bugs/non-bugs. We investigate the feasibility and relevance of automatically labeling issues with what we call "API-domains," which are high-level categories of APIs. Therefore, we posit that the APIs used in the source code affected by an issue can be a proxy for the type of skills (e.g., DB, security, UI) needed to work on the issue. We ran a user study (n=74) to assess API-domain labels' relevancy to potential contributors, leveraged the issues' descriptions and the project history to build prediction models, and validated the predictions with contributors (n=20) of the projects. Our results show that (i) newcomers to the project consider API-domain labels useful in choosing tasks, (ii) labels can be predicted with a precision of 84% and a recall of 78.6% on average, (iii) the results of the predictions reached up to 71.3% in precision and 52.5% in recall when training with a project and testing in another (transfer learning), and (iv) project contributors consider most of the predictions helpful in identifying needed skills. These findings suggest our approach can be applied in practice to automatically label issues, assisting developers in finding tasks that better match their skills.
翻译:标记问题所需技能有助于开源软件项目贡献者选择任务。然而,人工标记问题耗时且易出错,当前自动化方法大多局限于将问题归类为缺陷/非缺陷。我们探究了自动标记所谓"API领域"(即API的高层级类别)问题的可行性与相关性。由此我们假定:受问题影响的源代码中使用的API可作为处理该问题所需技能类型(例如数据库、安全、用户界面)的代理指标。我们开展了一项用户研究(n=74)评估API领域标签对潜在贡献者的相关性,利用问题描述和项目历史构建预测模型,并邀请项目贡献者(n=20)验证预测结果。结果表明:(i)项目新人认为API领域标签有助于选择任务;(ii)标签预测平均精确率达84%,召回率达78.6%;(iii)在跨项目迁移学习场景下(一个项目训练、另一项目测试),预测精确率最高可达71.3%,召回率达52.5%;(iv)项目贡献者认为大多数预测有助于识别所需技能。这些发现表明,我们的方法可实际应用于自动标记问题,帮助开发者找到更匹配其技能的任务。