Understanding Bugs in Quantum Simulators: An Empirical Study

Quantum simulators are a foundational component of the quantum software ecosystem. They are widely used to develop and debug quantum programs, validate compiler transformations, and support empirical claims about correctness and performance. In the absence of large-scale quantum hardware, simulator outputs are often treated as ground truth for algorithm development and system evaluation. However, quantum simulators also introduce unique implementation challenges. They must faithfully emulate quantum behavior while executing on classical hardware, requiring complex representations of quantum state evolution, operator composition, and noise modeling. Yet, we still lack a large-scale and in-depth study of failures in quantum simulators. To bridge this gap, this work presents a comprehensive empirical study of bugs in widely used open-source quantum simulators. We analyze 394 confirmed bugs from 12 simulators and manually categorize them based on root causes, failure manifestations, affected components, and discovery mechanisms. Our study reveals several key findings. First, bug discovery is largely user-driven, with most crashes, exceptions, and resource-related failures not detected by automated testing and identified after deployment. Second, logical correctness failures are widespread and often silent, producing plausible but incorrect outputs without triggering crashes or explicit error signals. Third, many critical failures originate in classical simulator infrastructure, such as memory management, indexing, configuration, and dependency compatibility, rather than in core quantum execution logic. These findings provide new insights into the reliability challenges of quantum simulators and highlight opportunities to improve testing and validation practices in the quantum software ecosystem.

翻译：量子模拟器是量子软件生态系统的基础组成部分。它们被广泛用于开发和调试量子程序、验证编译器转换、以及支持关于正确性和性能的经验性论断。在缺乏大规模量子硬件的条件下，模拟器的输出常被视为算法开发和系统评估的基准（ground truth）。然而，量子模拟器也引入了独特的实现挑战。它们必须在经典硬件上运行时忠实地模拟量子行为，这就需要复杂的量子态演化、算符组合以及噪声建模表示。尽管如此，我们仍然缺乏对量子模拟器故障的大规模、深入研究。为弥补这一空白，本工作对广泛使用的开源量子模拟器中的缺陷进行了全面的实证研究。我们分析了来自12个模拟器的394个已确认缺陷，并基于根本原因、故障表现、受影响组件和发现机制进行了人工分类。我们的研究揭示了几个关键发现。首先，缺陷发现主要由用户驱动，大多数崩溃、异常和资源相关故障未被自动化测试检测到，而是在部署后才被识别。其次，逻辑正确性故障普遍存在且常为静默失效，它们会产生看似合理但错误的输出，而不会触发崩溃或显式错误信号。第三，许多关键故障源于经典模拟器基础设施，如内存管理、索引、配置和依赖兼容性，而非核心量子执行逻辑。这些发现为量子模拟器的可靠性挑战提供了新的见解，并强调了在量子软件生态系统中改进测试和验证实践的机会。