Differentiable Quantum Architecture Search for Quantum Reinforcement Learning

Differentiable quantum architecture search (DQAS) is a gradient-based framework to design quantum circuits automatically in the NISQ era. It was motivated by such as low fidelity of quantum hardware, low flexibility of circuit architecture, high circuit design cost, barren plateau (BP) problem, and periodicity of weights. People used it to address error mitigation, unitary decomposition, and quantum approximation optimization problems based on fixed datasets. Quantum reinforcement learning (QRL) is a part of quantum machine learning and often has various data. QRL usually uses a manually designed circuit. However, the pre-defined circuit needs more flexibility for different tasks, and the circuit design based on various datasets could become intractable in the case of a large circuit. The problem of whether DQAS can be applied to quantum deep Q-learning with various datasets is still open. The main target of this work is to discover the capability of DQAS to solve quantum deep Q-learning problems. We apply a gradient-based framework DQAS on reinforcement learning tasks and evaluate it in two different environments - cart pole and frozen lake. It contains input- and output weights, progressive search, and other new features. The experiments conclude that DQAS can design quantum circuits automatically and efficiently. The evaluation results show significant outperformance compared to the manually designed circuit. Furthermore, the performance of the automatically created circuit depends on whether the super-circuit learned well during the training process. This work is the first to show that gradient-based quantum architecture search is applicable to QRL tasks.

翻译：可微分量子架构搜索（DQAS）是一种基于梯度的框架，旨在NISQ时代自动设计量子电路。其研究动机源于量子硬件保真度低、电路架构灵活性不足、电路设计成本高、贫瘠高原（BP）问题以及权重的周期性等挑战。此前，研究者已将其应用于基于固定数据集的误差缓解、酉分解及量子近似优化等问题。量子强化学习（QRL）作为量子机器学习的分支，常面临多样化的数据输入。当前QRL通常采用人工设计的电路，但预定义电路难以灵活适配不同任务，且针对多样化数据集进行大尺度电路设计会变得极其复杂。关于DQAS能否适用于包含多种数据集的量子深度Q学习，目前仍是一个开放性问题。本文旨在探索DQAS解决量子深度Q学习问题的潜力。我们将基于梯度的DQAS框架应用于强化学习任务，并在倒立摆和冰湖两类环境中进行评测，同时引入输入-输出权重、渐进式搜索等新特性。实验表明，DQAS能够自动且高效地设计量子电路。相较于人工设计的电路，自动生成电路在评测中展现出显著性能优势。此外，自动电路的性能取决于超电路在训练过程中的学习质量。本工作首次证明了基于梯度的量子架构搜索可适用于QRL任务。