As a new research area, quantum software testing lacks systematic testing benchmarks to assess testing techniques' effectiveness. Recently, some open-source benchmarks and mutation analysis tools have emerged. However, there is insufficient evidence on how various quantum circuit characteristics (e.g., circuit depth, number of quantum gates), algorithms (e.g., Quantum Approximate Optimization Algorithm), and mutation characteristics (e.g., mutation operators) affect the most mutant detection in quantum circuits. Studying such relations is important to systematically design faulty benchmarks with varied attributes (e.g., the difficulty in detecting a seeded fault) to facilitate assessing the cost-effectiveness of quantum software testing techniques efficiently. To this end, we present a large-scale empirical evaluation with more than 700K faulty benchmarks (quantum circuits) generated by mutating 382 real-world quantum circuits. Based on the results, we provide valuable insights for researchers to define systematic quantum mutation analysis techniques. We also provide a tool to recommend mutants to users based on chosen characteristics (e.g., a quantum algorithm type) and the required difficulty of killing mutants. Finally, we also provide faulty benchmarks that can already be used to assess the cost-effectiveness of quantum software testing techniques.
翻译:作为新兴研究领域,量子软件测试缺乏系统性测试基准来评估测试技术的有效性。近年来,虽然出现了部分开源基准和突变分析工具,但关于量子电路特性(如电路深度、量子门数量)、算法(如量子近似优化算法)及突变特性(如突变算子)对量子电路突变体检测效果的影响仍缺乏充分证据。研究这类关联对于系统设计具有差异化属性(如植入故障的检测难度)的故障基准具有重要意义,可有效促进量子软件测试技术的成本效益评估。为此,我们通过突变382个真实量子电路生成超过70万个故障基准(量子电路),开展了大规模实证研究。基于实验结果,我们为研究者定义系统性量子突变分析技术提供重要启示;同时开发了推荐工具,可根据用户选择的特征(如量子算法类型)及所需突变体击杀难度推荐相应突变体。此外,本文还提供了可直接用于评估量子软件测试技术成本效益的故障基准数据集。