Who is the Real Hero? Measuring Developer Contribution via Multi-dimensional Data Integration

Proper incentives are important for motivating developers in open-source communities, which is crucial for maintaining the development of open-source software healthy. To provide such incentives, an accurate and objective developer contribution measurement method is needed. However, existing methods rely heavily on manual peer review, lacking objectivity and transparency. The metrics of some automated works about effort estimation use only syntax-level or even text-level information, such as changed lines of code, which lack robustness. Furthermore, some works about identifying core developers provide only a qualitative understanding without a quantitative score or have some project-specific parameters, which makes them not practical in real-world projects. To this end, we propose CValue, a multidimensional information fusion-based approach to measure developer contributions. CValue extracts both syntax and semantic information from the source code changes in four dimensions: modification amount, understandability, inter-function and intra-function impact of modification. It fuses the information to produce the contribution score for each of the commits in the projects. Experimental results show that CValue outperforms other approaches by 19.59% on 10 real-world projects with manually labeled ground truth. We validated and proved that the performance of CValue, which takes 83.39 seconds per commit, is acceptable to be applied in real-world projects. Furthermore, we performed a large-scale experiment on 174 projects and detected 2,282 developers having inflated commits. Of these, 2,050 developers did not make any syntax contribution; and 103 were identified as bots.

翻译：合理的激励措施对于激发开源社区开发者的积极性至关重要，而这也是维持开源软件健康发展的关键。为提供此类激励，需要一种准确且客观的开发者贡献测量方法。然而，现有方法过度依赖人工同行评审，缺乏客观性与透明度。部分自动化工作量评估工作仅使用语法级甚至文本级信息（如代码变更行数），缺乏鲁棒性。此外，识别核心开发者的相关工作仅提供定性理解而无定量评分，或包含针对特定项目的参数，导致其在实际项目中缺乏实用性。为此，我们提出CValue——一种基于多维信息融合的开发者贡献测量方法。CValue从四个维度（修改量、可理解性、函数间及函数内修改影响）提取源代码变更中的语法与语义信息，通过信息融合为项目中的每次提交生成贡献评分。实验结果表明：在10个含人工标注基准真实项目上，CValue性能较其他方法提升19.59%。我们验证并证明CValue（单次提交处理耗时83.39秒）的性能可满足实际项目应用需求。此外，我们针对174个项目开展大规模实验，检测出2,282名具有膨胀提交的开发者——其中2,050人未产生任何语法贡献，103人被识别为机器人账户。