The Cost of Certainty: Shot Budgets in Quantum Program Testing

As quantum computing advances toward early fault-tolerant machines, testing and verification of quantum programs become urgent but costly, since each execution consumes scarce hardware resources. Unlike in classical software testing, every measurement must be carefully budgeted. This paper develops a unified framework for reasoning about how many measurements are required to verify quantum programs. The goal is to connect theoretical error bounds with concrete test strategies and to extend the analysis from individual tests to full program-level verification. We analyze the relationship between error probability, fidelity, trace distance, and the quantum Chernoff bound to establish fundamental shot count limits. These foundations are applied to three representative testing methods: the inverse test, the swap test, and the chi-square test. Both idealized and noisy devices are considered. We also introduce a program-level budgeting approach that allocates verification effort across multiple subroutines. The inverse test is the most measurement efficient, the swap test requires about twice as many shots, and the chi-square test is easiest to implement but often needs orders of magnitude more measurements. In the presence of noise, calibrated baselines may increase measurement requirements beyond theoretical estimates. At the program level, distributing a global fidelity target across many fine-grained functions can cause verification costs to grow rapidly, whereas coarser decompositions or weighted allocations remain more practical. The framework clarifies trade-offs among different testing strategies, noise handling, and program decomposition. It provides practical guidance for budgeting measurement shots in quantum program testing, helping practitioners balance rigour against cost when designing verification strategies.

翻译：随着量子计算向早期容错机器迈进，量子程序的测试与验证变得紧迫但成本高昂，因为每次执行都会消耗稀缺的硬件资源。与经典软件测试不同，每次测量都必须谨慎规划。本文建立了一个统一框架，用于推理验证量子程序所需的测量次数。其目标是将理论误差界与具体测试策略联系起来，并将分析从单个测试扩展到完整的程序级验证。我们分析了错误概率、保真度、迹距离与量子切尔诺夫界之间的关系，以建立基本的测量次数极限。这些基础被应用于三种代表性测试方法：逆测试、交换测试与卡方测试。同时考虑了理想化与含噪声的设备。我们还引入了一种程序级预算方法，用于在多个子例程间分配验证工作量。逆测试的测量效率最高，交换测试所需测量次数约为其两倍，卡方测试最易实现但通常需要数量级更多的测量。在存在噪声的情况下，校准基线可能使测量需求超出理论估计。在程序层面，将全局保真度目标分配到许多细粒度函数可能导致验证成本快速增长，而更粗粒度的分解或加权分配则更为实用。该框架阐明了不同测试策略、噪声处理与程序分解之间的权衡。它为量子程序测试中的测量预算提供了实用指导，帮助实践者在设计验证策略时权衡严谨性与成本。