In the dynamic landscape of open source software (OSS) development, understanding and addressing incivility within issue discussions is crucial for fostering healthy and productive collaborations. This paper presents a curated dataset of 404 locked GitHub issue discussion threads and 5961 individual comments, collected from 213 OSS projects. We annotated the comments with various categories of incivility using Tone Bearing Discussion Features (TBDFs), and, for each issue thread, we annotated the triggers, targets, and consequences of incivility. We observed that Bitter frustration, Impatience, and Mocking are the most prevalent TBDFs exhibited in our dataset. The most common triggers, targets, and consequences of incivility include Failed use of tool/code or error messages, People, and Discontinued further discussion, respectively. This dataset can serve as a valuable resource for analyzing incivility in OSS and improving automated tools to detect and mitigate such behavior.
翻译:在开源软件(OSS)开发的动态环境中,理解并解决议题讨论中的不文明行为对促进健康高效的合作至关重要。本文构建了一个涵盖213个开源项目、包含404条锁定GitHub议题讨论线程及5961条独立评论的精选数据集。我们采用语态承载讨论特征(TBDFs)对评论进行多类别不文明行为标注,并针对每条议题线程标注了不文明行为的触发因素、作用对象及后果。研究发现,痛苦沮丧、不耐烦和嘲讽是本数据集中最常见的TBDFs表现形态。不文明行为最常见的触发因素、作用对象及后果分别为"工具/代码使用失败或错误信息"、"人员"及"终止进一步讨论"。该数据集可作为分析开源社区不文明现象的重要资源,并为改进自动化检测与缓解此类行为的工具提供支持。