Non-prehensile manipulation using onboard sensing presents a fundamental challenge: the manipulated object occludes the sensor's field of view, creating occluded regions that can lead to collisions. We propose CURA-PPO, a reinforcement learning framework that addresses this challenge by explicitly modeling uncertainty under partial observability. By predicting collision possibility as a distribution, we extract both risk and uncertainty to guide the robot's actions. The uncertainty term encourages active perception, enabling simultaneous manipulation and information gathering to resolve occlusions. When combined with confidence maps that capture observation reliability, our approach enables safe navigation despite severe sensor occlusion. Extensive experiments across varying object sizes and obstacle configurations demonstrate that CURA-PPO achieves up to 3X higher success rates than the baselines, with learned behaviors that handle occlusions. Our method provides a practical solution for autonomous manipulation in cluttered environments using only onboard sensing.
翻译:基于机载传感的非抓取式操作面临一个根本性挑战:被操作物体会遮挡传感器的视野,形成可能导致碰撞的遮挡区域。我们提出CURA-PPO强化学习框架,通过显式建模部分可观测性下的不确定性来解决这一挑战。通过将碰撞可能性预测为概率分布,我们提取风险与不确定性双重指标来引导机器人动作。不确定性项激励主动感知,实现操作与信息采集同步进行以消除遮挡。结合捕捉观测可靠性的置信度地图,我们的方法能在严重传感器遮挡下实现安全导航。在不同物体尺寸和障碍物配置的大量实验中,CURA-PPO相比基线方法成功率提升最高达3倍,并展现出处理遮挡的学习行为。该方法为仅使用机载传感在杂乱环境中实现自主操作提供了实用解决方案。