Many open source projects provide good first issues (GFIs) to attract and retain newcomers. Although several automated GFI recommenders have been proposed, existing recommenders are limited to recommending generic GFIs without considering differences between individual newcomers. However, we observe mismatches between generic GFIs and the diverse background of newcomers, resulting in failed attempts, discouraged onboarding, and delayed issue resolution. To address this problem, we assume that personalized first issues (PFIs) for newcomers could help reduce the mismatches. To justify the assumption, we empirically analyze 37 newcomers and their first issues resolved across multiple projects. We find that the first issues resolved by the same newcomer share similarities in task type, programming language, and project domain. These findings underscore the need for a PFI recommender to improve over state-of-the-art approaches. For that purpose, we identify features that influence newcomers' personalized selection of first issues by analyzing the relationship between possible features of the newcomers and the characteristics of the newcomers' chosen first issues. We find that the expertise preference, OSS experience, activeness, and sentiment of newcomers drive their personalized choice of the first issues. Based on these findings, we propose a Personalized First Issue Recommender (PFIRec), which employs LamdaMART to rank candidate issues for a given newcomer by leveraging the identified influential features. We evaluate PFIRec using a dataset of 68,858 issues from 100 GitHub projects. The evaluation results show that PFIRec outperforms existing first issue recommenders, potentially doubling the probability that the top recommended issue is suitable for a specific newcomer and reducing one-third of a newcomer's unsuccessful attempts to identify suitable first issues, in the median.
翻译:许多开源项目通过提供“好首个任务”(GFI)来吸引和留住新手。尽管已有多个自动化GFI推荐器被提出,但现有推荐器仅能推荐通用型GFI,未能考虑新手个体差异。然而,我们观察到通用GFI与新手多样化背景之间存在不匹配,导致尝试失败、入职受挫及问题解决延迟。为解决这一问题,我们假设为新手指派个性化首个任务(PFI)有助于减少此类不匹配。为验证该假设,我们实证分析了37名新手及其在多个项目中解决的第一个任务,发现同一新手解决的第一个任务在任务类型、编程语言和项目领域上具有相似性。这些发现凸显了PFI推荐器相较于现有方法的改进需求。为此,我们通过分析新手可能特征与其所选首个任务特性之间的关系,识别了影响新手个性化选择首个任务的特征。研究发现:专业偏好、开源经验、活跃度及情感倾向驱动了新手的个性化任务选择。基于这些发现,我们提出个性化首个任务推荐器(PFIRec),该推荐器利用LamdaMART算法,通过整合上述关键特征为特定新手排序候选任务。我们使用来自100个GitHub项目的68,858个任务数据集对PFIRec进行评估。评估结果表明,PFIRec优于现有首个任务推荐器,在中位数情况下,其将推荐首位任务对特定新手的适配概率提升近一倍,并减少了新手识别合适首个任务的三分之一无效尝试。