DQN (Deep Q-Network) is a method to perform Q-learning for reinforcement learning using deep neural networks. DQNs require a large buffer and batch processing for an experience replay and rely on a backpropagation based iterative optimization, making them difficult to be implemented on resource-limited edge devices. In this paper, we propose a lightweight on-device reinforcement learning approach for low-cost FPGA devices. It exploits a recently proposed neural-network based on-device learning approach that does not rely on the backpropagation method but uses OS-ELM (Online Sequential Extreme Learning Machine) based training algorithm. In addition, we propose a combination of L2 regularization and spectral normalization for the on-device reinforcement learning so that output values of the neural network can be fit into a certain range and the reinforcement learning becomes stable. The proposed reinforcement learning approach is designed for PYNQ-Z1 board as a low-cost FPGA platform. The evaluation results using OpenAI Gym demonstrate that the proposed algorithm and its FPGA implementation complete a CartPole-v0 task 29.77x and 89.40x faster than a conventional DQN-based approach when the number of hidden-layer nodes is 64.
翻译:DQN(深度Q网络)是一种利用深度神经网络进行强化学习Q学习的方法。DQN需要大容量缓冲区和批处理来实现经验回放,并依赖基于反向传播的迭代优化,这使得它们难以在资源受限的边缘设备上实现。本文针对低成本FPGA设备提出了一种轻量级片上强化学习方法。该方法利用了最近提出的不依赖反向传播、而采用基于OS-ELM(在线序贯极限学习机)训练算法的神经网络片上学习技术。此外,我们提出将L2正则化与谱归一化相结合用于片上强化学习,使神经网络输出值能够限定在一定范围内,从而保证强化学习的稳定性。所提出的强化学习方法专为低成本FPGA平台PYNQ-Z1板设计。基于OpenAI Gym的评估结果表明,当隐藏层节点数为64时,所提算法及其FPGA实现完成CartPole-v0任务的速度分别是传统基于DQN方法的29.77倍和89.40倍。