Reinforcement Learning (RL) trains agents to learn optimal behavior by maximizing reward signals from experience datasets. However, RL training often faces memory limitations, leading to execution latencies and prolonged training times. To overcome this, SwiftRL explores Processing-In-Memory (PIM) architectures to accelerate RL workloads. We achieve near-linear performance scaling by implementing RL algorithms like Tabular Q-learning and SARSA on UPMEM PIM systems and optimizing for hardware. Our experiments on OpenAI GYM environments using UPMEM hardware demonstrate superior performance compared to CPU and GPU implementations.
翻译:强化学习通过最大化经验数据集中的奖励信号来训练智能体学习最优行为。然而,强化学习训练常面临内存限制,导致执行延迟和训练时间延长。为克服此问题,SwiftRL探索利用处理-内存(PIM)架构加速强化学习负载。通过在UPMEM PIM系统上实现Tabular Q-learning和SARSA等强化学习算法并针对硬件优化,我们实现了近线性性能扩展。在基于UPMEM硬件的OpenAI GYM环境实验中,我们的方案展现出优于CPU和GPU实现的性能。