Who is the Real Hero? Measuring Developer Contribution via Multi-dimensional Data Integration

Proper incentives are important for motivating developers in open-source communities, which is crucial for maintaining the development of open-source software healthy. To provide such incentives, an accurate and objective developer contribution measurement method is needed. However, existing methods rely heavily on manual peer review, lacking objectivity and transparency. The metrics of some automated works about effort estimation use only syntax-level or even text-level information, such as changed lines of code, which lack robustness. Furthermore, some works about identifying core developers provide only a qualitative understanding without a quantitative score or have some project-specific parameters, which makes them not practical in real-world projects. To this end, we propose CValue, a multidimensional information fusion-based approach to measure developer contributions. CValue extracts both syntax and semantic information from the source code changes in four dimensions: modification amount, understandability, inter-function and intra-function impact of modification. It fuses the information to produce the contribution score for each of the commits in the projects. Experimental results show that CValue outperforms other approaches by 19.59% on 10 real-world projects with manually labeled ground truth. We validated and proved that the performance of CValue, which takes 83.39 seconds per commit, is acceptable to be applied in real-world projects. Furthermore, we performed a large-scale experiment on 174 projects and detected 2,282 developers having inflated commits. Of these, 2,050 developers did not make any syntax contribution; and 103 were identified as bots.

翻译：恰当的激励对开源社区中开发者的积极性至关重要，这是维持开源软件健康发展的关键。为实现此类激励，需要一种准确且客观的开发者贡献度量方法。然而，现有方法过度依赖人工同行评审，缺乏客观性和透明度。部分基于自动化工作量估算的指标仅使用语法层面甚至文本层面的信息（如代码修改行数），鲁棒性不足。此外，关于核心开发者识别的研究要么仅提供定性结论而无定量评分，要么包含项目特定参数，难以在实际项目中应用。为此，我们提出CValue——一种基于多维信息融合的开发者贡献度量方法。CValue从代码修改中提取语法与语义信息，涵盖四个维度：修改量、可理解性、函数间与函数内修改影响。通过信息融合，该方法为项目中的每次提交生成贡献分数。实验结果表明，在10个包含人工标注真值的真实项目中，CValue的性能较其他方法提升19.59%。我们验证并证明，CValue每次提交处理耗时83.39秒，足以满足实际项目应用需求。进一步地，我们在174个项目上开展大规模实验，检测出2,282名存在注水提交的开发者，其中2,050人未产生任何语法贡献，103人被识别为机器人账户。