Stochastic contextual bandits are fundamental for sequential decision-making but pose significant challenges for existing neural network-based algorithms, particularly when scaling to quantum neural networks (QNNs) due to issues such as massive over-parameterization, computational instability, and the barren plateau phenomenon. This paper introduces the Quantum Neural Tangent Kernel-Upper Confidence Bound (QNTK-UCB) algorithm, a novel algorithm that leverages the Quantum Neural Tangent Kernel (QNTK) to address these limitations. By freezing the QNN at a random initialization and utilizing its static QNTK as a kernel for ridge regression, QNTK-UCB bypasses the unstable training dynamics inherent in explicit parameterized quantum circuit training while fully exploiting the unique quantum inductive bias. For a time horizon $T$ and $K$ actions, our theoretical analysis reveals a significantly improved parameter scaling of $Ω((TK)^3)$ for QNTK-UCB, a substantial reduction compared to $Ω((TK)^8)$ required by classical NeuralUCB algorithms for similar regret guarantees. Empirical evaluations on non-linear synthetic benchmarks and quantum-native variational quantum eigensolver tasks demonstrate QNTK-UCB's superior sample efficiency in low-data regimes. This work highlights how the inherent properties of QNTK provide implicit regularization and a sharper spectral decay, paving the way for achieving ``quantum advantage'' in online learning.
翻译:随机上下文赌博机是序列决策制定的基础,但对现有的基于神经网络的算法提出了重大挑战,尤其是在扩展到量子神经网络(QNNs)时,由于存在大规模过参数化、计算不稳定性和贫瘠高原现象等问题。本文介绍了量子神经正切核-上置信界(QNTK-UCB)算法,这是一种新颖的算法,它利用量子神经正切核(QNTK)来解决这些局限性。通过在随机初始化时冻结QNN,并利用其静态QNTK作为岭回归的核函数,QNTK-UCB绕过了显式参数化量子电路训练中固有的不稳定训练动态,同时充分利用了独特的量子归纳偏置。对于时间范围$T$和$K$个动作,我们的理论分析表明,QNTK-UCB的参数缩放显著改善为$Ω((TK)^3)$,与经典NeuralUCB算法在类似遗憾保证下所需的$Ω((TK)^8)$相比大幅降低。在非线性合成基准测试和量子原生变分量子本征求解器任务上的实证评估表明,QNTK-UCB在低数据区域具有卓越的样本效率。这项工作突显了QNTK的固有特性如何提供隐式正则化和更锐利的谱衰减,为实现在线学习中的“量子优势”铺平了道路。