Generalization in Deep Reinforcement Learning (DRL) across unseen environment variations often requires training over a diverse set of scenarios. Many existing DRL algorithms struggle with efficiency when handling numerous variations. The Generalist-Specialist Learning (GSL) framework addresses this by first training a generalist model on all variations, then creating specialists from the generalist's weights, each focusing on a subset of variations. The generalist then refines its learning with assistance from the specialists. However, random task partitioning in GSL can impede performance by assigning vastly different variations to the same specialist, often resulting in each specialist focusing on only one variation, which raises computational costs. To improve this, we propose Generalist-Specialist Learning with Point Cloud Feature-based Task Partitioning (GSL-PCD). Our approach clusters environment variations based on features extracted from object point clouds and uses balanced clustering with a greedy algorithm to assign similar variations to the same specialist. Evaluations on robotic manipulation tasks from the ManiSkill benchmark demonstrate that point cloud feature-based partitioning outperforms vanilla partitioning by 9.4%, with a fixed number of specialists, and reduces computational and sample requirements by 50% to achieve comparable performance.
翻译:深度强化学习(DRL)在未见环境变化上的泛化通常需要在多样化的场景集合上进行训练。许多现有DRL算法在处理大量变化时面临效率问题。通用-专用学习(GSL)框架通过首先在所有变化上训练一个通用模型,然后基于该通用模型的权重创建多个专用模型来解决此问题,每个专用模型专注于一个变化子集。随后,通用模型在专用模型的辅助下进一步优化其学习。然而,GSL中的随机任务划分可能将差异极大的变化分配给同一个专用模型,从而损害性能,这常常导致每个专用模型仅专注于单一变化,并增加了计算成本。为改进此问题,我们提出了基于点云特征任务划分的通用-专用学习(GSL-PCD)。我们的方法基于从物体点云提取的特征对环境变化进行聚类,并采用贪心算法进行平衡聚类,从而将相似的变化分配给同一个专用模型。在ManiSkill基准测试的机器人操作任务上的评估表明,在固定专用模型数量的情况下,基于点云特征的划分方法相比原始划分方法性能提升9.4%,并且以减少50%的计算和样本需求实现了可比的性能。