Differentiable Quantum Architecture Search for Quantum Reinforcement Learning

Differentiable quantum architecture search (DQAS) is a gradient-based framework to design quantum circuits automatically in the NISQ era. It was motivated by such as low fidelity of quantum hardware, low flexibility of circuit architecture, high circuit design cost, barren plateau (BP) problem, and periodicity of weights. People used it to address error mitigation, unitary decomposition, and quantum approximation optimization problems based on fixed datasets. Quantum reinforcement learning (QRL) is a part of quantum machine learning and often has various data. QRL usually uses a manually designed circuit. However, the pre-defined circuit needs more flexibility for different tasks, and the circuit design based on various datasets could become intractable in the case of a large circuit. The problem of whether DQAS can be applied to quantum deep Q-learning with various datasets is still open. The main target of this work is to discover the capability of DQAS to solve quantum deep Q-learning problems. We apply a gradient-based framework DQAS on reinforcement learning tasks and evaluate it in two different environments - cart pole and frozen lake. It contains input- and output weights, progressive search, and other new features. The experiments conclude that DQAS can design quantum circuits automatically and efficiently. The evaluation results show significant outperformance compared to the manually designed circuit. Furthermore, the performance of the automatically created circuit depends on whether the super-circuit learned well during the training process. This work is the first to show that gradient-based quantum architecture search is applicable to QRL tasks.

翻译：可微量子架构搜索(DQAS)是一种基于梯度的框架，旨在NISQ时代自动设计量子电路。该方法的提出源于量子硬件保真度低、电路架构灵活性不足、电路设计成本高、贫瘠高原(BP)问题以及权重周期性等挑战。此前，研究者已将其用于基于固定数据集的误差缓解、酉分解和量子近似优化问题。量子强化学习(QRL)作为量子机器学习的分支，通常涉及多样化的数据，且常采用人工设计的电路。然而，预设电路难以灵活适配不同任务，基于多样化数据集的电路设计在大规模场景下可能变得棘手。关于DQAS能否应用于包含多样化数据的量子深度Q学习仍是一个开放问题。本文主要目标是探索DQAS解决量子深度Q学习问题的能力。我们将基于梯度的DQAS框架应用于强化学习任务，并在两种不同环境（倒立摆和冰冻湖）中进行评估。该框架具有输入输出权重、渐进式搜索等新特性。实验表明，DQAS能自动高效地设计量子电路。评估结果显示，自动设计的电路性能显著优于人工设计的电路。此外，自动生成电路的表现取决于训练过程中超级电路的学习效果。本研究首次证明基于梯度的量子架构搜索可应用于QRL任务。