StackOverflow (SO) is a widely used question-and-answer (Q\&A) website for software developers and computer scientists. GitHub is an online development platform used for storing, tracking, and collaborating on software projects. Prior work relates the information mined from both platforms to link user accounts or compare developers' activities across platforms. However, not much work is done to characterize the SO answers reused by GitHub projects. For this paper, we did an empirical study by mining the SO answers reused by Java projects available on GitHub. We created a hybrid approach of clone detection, keyword-based search, and manual inspection, to identify the answer(s) actually leveraged by developers. Based on the identified answers, we further studied topics of the discussion threads, answer characteristics (e.g., scores, ages, code lengths, and text lengths), and developers' reuse practices. We observed that most reused answers offer programs to implement specific coding tasks. Among all analyzed SO discussion threads, the reused answers often have relatively higher scores, older ages, longer code, and longer text than unused answers. In only 9% of scenarios (40/430), developers fully copied answer code for reuse. In the remaining scenarios, they reused partial code or created brand new code from scratch. Our study characterized 130 SO discussion threads referred to by Java developers in 357 GitHub projects. Our empirical findings can guide SO answerers to provide better answers, and shed lights on future research related to SO and GitHub.
翻译:StackOverflow(SO)是一个被软件开发者与计算机科学家广泛使用的问答(Q&A)网站。GitHub则是一个用于存储、追踪及协作软件项目的在线开发平台。先前的研究通过挖掘两个平台的信息来关联用户账户或比较开发者跨平台的活动,然而目前鲜有工作对GitHub项目复用的SO回答进行特征化分析。本文通过挖掘GitHub上Java项目复用的SO回答开展实证研究。我们创建了一种混合方法,融合克隆检测、基于关键词的搜索和人工审查,以识别开发者实际采用的回答。基于识别出的回答,我们进一步研究了讨论帖子的主题、回答特征(如得分、发布时间、代码长度和文本长度)以及开发者的复用实践。观察到大多数被复用的回答提供了实现特定编码任务的程序。在所有分析的SO讨论帖子中,被复用的回答通常具有更高的得分、更早的发布时间、更长的代码和文本长度。仅有9%的场景(40/430)中,开发者完整复制了回答代码进行复用;其余场景中,他们复用了部分代码或从头编写全新代码。本研究对357个GitHub项目中Java开发者引用的130个SO讨论帖子进行了特征化分析。我们的实证结果可指导SO回答者提供更优质的回答,并为未来与SO及GitHub相关的研究提供了启示。