Actor-Critic models are a class of model-free deep reinforcement learning (RL) algorithms that have demonstrated effectiveness across various robot learning tasks. While considerable research has focused on improving training stability and data sampling efficiency, most deployment strategies have remained relatively simplistic, typically relying on direct actor policy rollouts. In contrast, we propose \pachs{} (\textit{P}arallel \textit{A}ctor-\textit{C}ritic \textit{H}euristic \textit{S}earch), an efficient parallel best-first search algorithm for inference that leverages both components of the actor-critic architecture: the actor network generates actions, while the critic network provides cost-to-go estimates to guide the search. Two levels of parallelism are employed within the search -- actions and cost-to-go estimates are generated in batches by the actor and critic networks respectively, and graph expansion is distributed across multiple threads. We demonstrate the effectiveness of our approach in robotic manipulation tasks, including collision-free motion planning and contact-rich interactions such as non-prehensile pushing. Visit p-achs.github.io for demonstrations and examples.
翻译:Actor-Critic模型是一类无模型深度强化学习算法,已在多种机器人学习任务中展现出有效性。尽管大量研究致力于提升训练稳定性和数据采样效率,但多数部署策略仍相对简单,通常依赖于直接执行行动者策略。相比之下,我们提出了\pachs{}(并行行动者-评论者启发式搜索),这是一种高效的并行最佳优先搜索推理算法,它同时利用了行动者-评论者架构的两个组件:行动者网络生成动作,而评论者网络提供剩余成本估计以引导搜索。该搜索采用两级并行机制——行动者网络和评论者网络分别批量生成动作与剩余成本估计,同时图扩展过程被分配到多个线程中执行。我们在机器人操作任务中验证了本方法的有效性,包括无碰撞运动规划以及接触丰富的交互(如非抓取式推动)。请访问p-achs.github.io查看演示与示例。