We present a framework to integrate tensor network (TN) methods with reinforcement learning (RL) for solving dynamical optimisation tasks. We consider the RL actor-critic method, a model-free approach for solving RL problems, and introduce TNs as the approximators for its policy and value functions. Our "actor-critic with tensor networks" (ACTeN) method is especially well suited to problems with large and factorisable state and action spaces. As an illustration of the applicability of ACTeN we solve the exponentially hard task of sampling rare trajectories in two paradigmatic stochastic models, the East model of glasses and the asymmetric simple exclusion process (ASEP), the latter being particularly challenging to other methods due to the absence of detailed balance. With substantial potential for further integration with the vast array of existing RL methods, the approach introduced here is promising both for applications in physics and to multi-agent RL problems more generally.
翻译:我们提出一个框架,将张量网络方法与强化学习相结合,以解决动力学优化问题。我们考虑强化学习中的演员-评论家方法——一种无模型的强化学习求解方案,并引入张量网络作为其策略函数和价值函数的近似器。我们的"演员-评论家与张量网络"方法特别适用于状态和动作空间具有大且可分解特性的问题。为展示ACTeN的适用性,我们解决了两个典型随机模型中稀有轨迹采样的指数级困难任务:玻璃模型的East模型和非对称简单排斥过程(ASEP),后者由于缺乏细致平衡条件而给其他方法带来特别挑战。通过进一步与现有丰富强化学习方法集成的巨大潜力,本文提出的方法在物理学应用及更广泛的多智能体强化学习问题中均具有前景。