Understanding the causes of software defects is essential for reliable software maintenance and ecosystem stability. However, existing bug datasets do not distinguish between issues originating within a project from those caused by external dependencies or environmental factors. In this paper we present InEx-Bug, a manually annotated dataset of 377 GitHub issues from 103 NPM repositories, categorizing issues as Intrinsic (internal defect), Extrinsic (dependency/environment issue), Not-a-Bug, or Unknown. Beyond labels, the dataset includes rich temporal and behavioral metadata such as maintainer participation, code changes, and reopening patterns. Analyses show Intrinsic bugs resolve faster (median 8.9 vs 10.2 days), are close more often (92% vs 78%), and require code changes more frequently (57% vs 28%) compared to Extrinsic bugs. While Extrinsic bugs exhibit higher reopen rates (12% vs 4%) and delayed recurrence (median 157 vs 87 days). The dataset provides a foundation for further studying Intrinsic and Extrinsic defects in the NPM ecosystem.
翻译:理解软件缺陷的成因对于可靠的软件维护和生态系统稳定性至关重要。然而,现有的缺陷数据集未能区分源自项目内部的问题与由外部依赖或环境因素引起的问题。本文提出了InEx-Bug,这是一个包含来自103个NPM仓库的377个GitHub问题的人工标注数据集,将问题分类为内在缺陷、外在缺陷、非缺陷或未知。除了标签外,数据集还包含丰富的时间和元数据,如维护者参与度、代码变更和问题重开模式。分析表明,与外在缺陷相比,内在缺陷解决速度更快、关闭频率更高、需要代码变更的情况更常见。而外在缺陷则表现出更高的重开率和更晚的复发间隔。该数据集为进一步研究NPM生态系统中的内在与外在缺陷奠定了基础。