Reinforcement learning-based control policies have been frequently demonstrated to be more effective than analytical techniques for many manipulation tasks. Commonly, these methods learn neural control policies that predict end-effector pose changes directly from observed state information. For tasks like inserting delicate connectors which induce force constraints, pose-based policies have limited explicit control over force and rely on carefully tuned low-level controllers to avoid executing damaging actions. In this work, we present hybrid position-force control policies that learn to dynamically select when to use force or position control in each control dimension. To improve learning efficiency of these policies, we introduce Mode-Aware Training for Contact Handling (MATCH) which adjusts policy action probabilities to explicitly mirror the mode selection behavior in hybrid control. We validate MATCH's learned policy effectiveness using fragile peg-in-hole tasks under extreme localization uncertainty. We find MATCH substantially outperforms pose-control policies -- solving these tasks with up to 10% higher success rates and 5x fewer peg breaks than pose-only policies under common types of state estimation error. MATCH also demonstrates data efficiency equal to pose-control policies, despite learning in a larger and more complex action space. In over 1600 sim-to-real experiments, we find MATCH succeeds twice as often as pose policies in high noise settings (33% vs.~68%) and applies ~30% less force on average compared to variable impedance policies on a Franka FR3 in laboratory conditions.
翻译:基于强化学习的控制策略已被频繁证明在诸多操作任务中优于解析方法。通常,这些方法学习神经控制策略,直接从观测状态信息预测末端执行器位姿变化。对于插入易损连接器等涉及力约束的任务,基于位姿的策略对力的显式控制有限,需依赖精心调校的低层控制器避免执行破坏性动作。本研究提出混合位置-力控制策略,学习在每个控制维度动态选择使用力控制或位置控制。为提升策略学习效率,我们引入接触处理模式感知训练(MATCH),通过调整策略动作概率显式反映混合控制中的模式选择行为。利用极端定位不确定性下的易损销孔插入任务验证MATCH学得策略的有效性。研究发现MATCH显著优于位姿控制策略:在常见状态估计误差类型下,相较纯位姿策略成功率提升高达10%,销体断裂次数减少5倍。尽管MATCH在更大更复杂的动作空间学习,其数据效率仍与位姿控制策略相当。在超1600次仿真到真实迁移实验中,MATCH在高噪声场景下成功率是位姿策略的两倍(33% vs. 68%),且在实验室条件下使用Franka FR3机械臂时,相较可变阻抗策略平均施加力减少约30%。