Reinforcement Learning (RL) has opened up new opportunities to enhance existing smart systems that generally include a complex decision-making process. However, modern RL algorithms, e.g., Deep Q-Networks (DQN), are based on deep neural networks, resulting in high computational costs. In this paper, we propose QHD, an off-policy value-based Hyperdimensional Reinforcement Learning, that mimics brain properties toward robust and real-time learning. QHD relies on a lightweight brain-inspired model to learn an optimal policy in an unknown environment. On both desktop and power-limited embedded platforms, QHD achieves significantly better overall efficiency than DQN while providing higher or comparable rewards. QHD is also suitable for highly-efficient reinforcement learning with great potential for online and real-time learning. Our solution supports a small experience replay batch size that provides 12.3 times speedup compared to DQN while ensuring minimal quality loss. Our evaluation shows QHD capability for real-time learning, providing 34.6 times speedup and significantly better quality of learning than DQN.
翻译:强化学习(RL)为增强通常包含复杂决策过程的现有智能系统提供了新机遇。然而,现代强化学习算法(如深度Q网络DQN)基于深度神经网络,导致计算成本高昂。本文提出QHD——一种基于脑启发超维计算的离线策略价值学习方法,通过模拟大脑特性实现鲁棒实时学习。QHD采用轻量级脑启发模型,在未知环境中学习最优策略。在桌面级和功耗受限的嵌入式平台上,QHD在提供更高或相当奖励的同时,整体效率显著优于DQN。该方法尤其适用于追求高效率和在线实时学习的强化学习场景。我们的方案支持极小经验回放批量大小,相比DQN实现12.3倍加速且质量损失极小。评估表明,QHD具备实时学习能力,在提供34.6倍加速的同时,学习质量显著优于DQN。