Quantum Software Engineering (QSE) is essential for ensuring the reliability and maintainability of hybrid quantum-classical systems, yet empirical evidence on how bugs emerge and affect quality in real-world quantum projects remains limited. This study presents the first ecosystem-scale longitudinal analysis of software bugs across 123 open source quantum repositories from 2012 to 2024, spanning eight functional categories, including full-stack libraries, simulators, annealing, algorithms, compilers, assembly, cryptography, and experimental computing. Using a mixed method approach combining repository mining, static code analysis, issue metadata extraction, and a validated rule-based classification framework, we analyze 32,296 verified bug reports. Results show that full-stack libraries and compilers are the most bug-prone categories due to circuit, gate, and transpilation-related issues, while simulators are mainly affected by measurement and noise modeling errors. Classical bugs primarily impact usability and interoperability, whereas quantum-specific bugs disproportionately degrade performance, maintainability, and reliability. Longitudinal analysis indicates ecosystem maturation, with bug densities peaking between 2017 and 2021 and declining thereafter. High-severity bugs cluster in cryptography, experimental computing, and compiler toolchains. Repositories employing automated testing detect more bugs and resolve issues faster. A negative binomial regression further shows that automated testing is associated with an approximate 60 percent reduction in expected bug incidence. Overall, this work provides the first large-scale data-driven characterization of quantum software bugs and offers empirical guidance for improving testing, documentation, and maintainability practices in QSE.
翻译:量子软件工程(QSE)对于确保混合量子-经典系统的可靠性与可维护性至关重要,然而关于现实量子项目中缺陷如何产生及影响质量的实证证据仍显不足。本研究首次对2012年至2024年间123个开源量子代码库的软件缺陷进行了生态系统规模的纵向分析,涵盖八大功能类别,包括全栈库、模拟器、退火算法、算法库、编译器、汇编工具、密码学及实验计算系统。通过结合代码库挖掘、静态代码分析、问题元数据提取以及经验证的基于规则的分类框架的混合方法,我们分析了32,296份已验证的缺陷报告。结果显示,全栈库和编译器因涉及电路、量子门及编译转换相关问题而成为缺陷率最高的类别,而模拟器主要受测量与噪声建模错误影响。经典缺陷主要影响可用性与互操作性,而量子特有缺陷则显著降低性能、可维护性与可靠性。纵向分析表明生态系统正逐步成熟,缺陷密度在2017年至2021年间达到峰值后持续下降。高严重性缺陷集中出现在密码学、实验计算及编译器工具链中。采用自动化测试的代码库能检测更多缺陷并更快解决问题。负二项回归分析进一步表明,自动化测试可使预期缺陷发生率降低约60%。总体而言,本研究首次通过大规模数据驱动方式揭示了量子软件缺陷的特征,并为改进QSE中的测试、文档编写及可维护性实践提供了实证指导。