Communities on GitHub often use issue labels as a way of triaging issues by assigning them priority ratings based on how urgently they should be addressed. The labels used are determined by the repository contributors and not standardised by GitHub. This makes it difficult for priority-related reasoning across repositories for both researchers and contributors. Previous work shows interest in how issues are labelled and what the consequences for those labels are. For instance, some previous work has used clustering models and natural language processing to categorise labels without a particular emphasis on priority. With this publication, we introduce a unique data set of 812 manually categorised labels pertaining to priority; normalised and ranked as low-, medium-, or high-priority. To provide an example of how this data set could be used, we have created a tool for GitHub contributors that will create a list of the highest priority issues from the repositories to which they contribute. We have released the data set and the tool for anyone to use on Zenodo because we hope that this will help the open source community address high-priority issues more effectively and inspire other uses.
翻译:GitHub上的社区通常使用问题标签作为一种分类方式,根据问题需要处理的紧急程度分配优先级评级。所使用的标签由仓库贡献者自行决定,而非GitHub标准化设置。这使得研究人员和贡献者难以跨仓库进行优先级相关的推理。以往研究表明,研究者对问题如何被标记以及这些标签带来的后果感兴趣。例如,有研究曾使用聚类模型和自然语言处理对标签进行分类,但未特别关注优先级。本论文引入一个独特的数据集,包含812个手动分类的优先级相关标签,这些标签经过归一化处理并按低、中、高三个优先级排序。为展示该数据集的用途,我们为GitHub贡献者开发了一个工具,可生成其贡献仓库中优先级最高的问题列表。我们已将数据集和工具发布在Zenodo平台上供所有人使用,希望此举能帮助开源社区更高效地处理高优先级问题,并启发其他应用场景。