An issue tracker is a software tool used by organisations to interact with users and manage various aspects of the software development lifecycle. With the rise of agile methodologies, issue trackers have become popular in open and closed-source settings alike. Internal and external stakeholders report, manage, and discuss "issues", which represent different information such as requirements and maintenance tasks. Issue trackers can quickly become complex ecosystems, with dozens of projects, hundreds of users, thousands of issues, and often millions of issue evolutions. Finding and understanding the relevant issues for the task at hand and keeping an overview becomes difficult with time. Moreover, managing issue workflows for diverse projects becomes more difficult as organisations grow, and more stakeholders get involved. To help address these difficulties, software and requirements engineering research have suggested automated techniques based on mining issue tracking data. Given the vast amount of textual data in issue trackers, many of these techniques leverage natural language processing. This chapter discusses four major use cases for algorithmically analysing issue data to assist stakeholders with the complexity and heterogeneity of information in issue trackers. The chapter is accompanied by a follow-along demonstration package with JupyterNotebooks.
翻译:问题追踪系统是组织用于与用户交互并管理软件开发生命周期各环节的软件工具。随着敏捷方法的普及,问题追踪系统在开源和闭源环境中均得到广泛应用。内部及外部利益相关者通过该系统报告、管理和讨论“问题”(issues),这些问题的内涵涵盖需求与维护任务等不同信息类型。随着时间推移,问题追踪系统可能演变为包含数十个项目、数百名用户、数千条问题及数百万次问题演变的复杂生态系统。针对特定任务快速定位和理解相关问题,并维持全局视野变得日益困难。此外,随着组织规模扩大和更多利益相关者参与,跨项目管理工作流的管理难度也随之提升。为应对这些挑战,软件与需求工程领域的研究提出了基于问题追踪数据挖掘的自动化技术。鉴于问题追踪系统中存在大量文本数据,许多此类技术采用自然语言处理方法。本章围绕算法分析问题数据的四个主要用例展开讨论,旨在帮助利益相关者应对问题追踪系统中信息的复杂性及异质性,并附有基于JupyterNotebook的配套演示包供实践参考。