How Do Java Developers Reuse StackOverflow Answers in Their GitHub Projects?

StackOverflow (SO) is a widely used question-and-answer (Q\&A) website for software developers and computer scientists. GitHub is an online development platform used for storing, tracking, and collaborating on software projects. Prior work relates the information mined from both platforms to link user accounts or compare developers' activities across platforms. However, not much work is done to characterize the SO answers reused by GitHub projects. For this paper, we did an empirical study by mining the SO answers reused by Java projects available on GitHub. We created a hybrid approach of clone detection, keyword-based search, and manual inspection, to identify the answer(s) actually leveraged by developers. Based on the identified answers, we further studied topics of the discussion threads, answer characteristics (e.g., scores, ages, code lengths, and text lengths), and developers' reuse practices. We observed that most reused answers offer programs to implement specific coding tasks. Among all analyzed SO discussion threads, the reused answers often have relatively higher scores, older ages, longer code, and longer text than unused answers. In only 9% of scenarios (40/430), developers fully copied answer code for reuse. In the remaining scenarios, they reused partial code or created brand new code from scratch. Our study characterized 130 SO discussion threads referred to by Java developers in 357 GitHub projects. Our empirical findings can guide SO answerers to provide better answers, and shed lights on future research related to SO and GitHub.

翻译：StackOverflow（SO）是一个被软件开发者与计算机科学家广泛使用的问答（Q&A）网站。GitHub则是一个用于存储、追踪及协作软件项目的在线开发平台。先前的研究通过挖掘两个平台的信息来关联用户账户或比较开发者跨平台的活动，然而目前鲜有工作对GitHub项目复用的SO回答进行特征化分析。本文通过挖掘GitHub上Java项目复用的SO回答开展实证研究。我们创建了一种混合方法，融合克隆检测、基于关键词的搜索和人工审查，以识别开发者实际采用的回答。基于识别出的回答，我们进一步研究了讨论帖子的主题、回答特征（如得分、发布时间、代码长度和文本长度）以及开发者的复用实践。观察到大多数被复用的回答提供了实现特定编码任务的程序。在所有分析的SO讨论帖子中，被复用的回答通常具有更高的得分、更早的发布时间、更长的代码和文本长度。仅有9%的场景（40/430）中，开发者完整复制了回答代码进行复用；其余场景中，他们复用了部分代码或从头编写全新代码。本研究对357个GitHub项目中Java开发者引用的130个SO讨论帖子进行了特征化分析。我们的实证结果可指导SO回答者提供更优质的回答，并为未来与SO及GitHub相关的研究提供了启示。

相关内容

GitHub

关注 88

http://GitHub.com 使用 Git 作为版本控制系统（version control system）提供在线源码托管的服务，同时是个有社交功能的开发者社区。国外类似服务： http://Bitbucket.com
http://Gitlab.com
国内类似服务：
http://Coding.net

《生成式模型: 变分自编码器与扩散模型》，75页ppt，Google DeepMind科学家Ruiqi Gao

专知会员服务

66+阅读 · 2023年6月10日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日