Voxel-grid reinforcement learning is widely adopted for path planning in redundant manipulators due to its simplicity and reproducibility. However, direct execution through point-wise numerical inverse kinematics on 7-DoF arms often yields step-size jitter, abrupt joint transitions, and instability near singular configurations. This work proposes a bridging framework between discrete planning and continuous execution without modifying the discrete planner itself. On the planning side, step-normalized 26-neighbor Cartesian actions and a geometric tie-breaking mechanism are introduced to suppress unnecessary turns and eliminate step-size oscillations. On the execution side, a task-priority damped least-squares (TP-DLS) inverse kinematics layer is implemented. This layer treats end-effector position as a primary task, while posture and joint centering are handled as subordinate tasks projected into the null space, combined with trust-region clipping and joint velocity constraints. On a 7-DoF manipulator in random sparse, medium, and dense environments, this bridge raises planning success in dense scenes from about 0.58 to 1.00, shortens representative path length from roughly 1.53 m to 1.10 m, and while keeping end-effector error below 1 mm, reduces peak joint accelerations by over an order of magnitude, substantially improving the continuous execution quality of voxel-based RL paths on redundant manipulators.
翻译:体素网格强化学习因其简洁性与可重复性,被广泛用于冗余机械臂的路径规划。然而,通过七自由度手臂的逐点数值逆运动学直接执行,常导致步长抖动、关节突变及靠近奇异位形时的失稳。本文提出一种桥接框架,在不修改离散规划器本体的前提下,实现离散规划与连续执行的有效衔接。在规划侧,引入步长归一化的26邻域笛卡尔动作与几何破链机制,以抑制非必要转向、消除步长振荡;在执行侧,构建任务优先级阻尼最小二乘逆运动学层:将末端位姿视为主任务,姿态与关节居中作为投影至零空间的从属任务,并融合信赖域剪裁与关节速度约束。在随机稀疏、中等及稠密环境下的七自由度机械臂实验中,该桥接框架将稠密场景的规划成功率从约0.58提升至1.00,典型路径长度由约1.53米缩短至1.10米,且在保持末端误差低于1毫米的同时,将峰值关节加速度降低一个数量级以上,显著提升了体素强化学习路径在冗余机械臂上的连续执行质量。