Learning-based motion planning can quickly generate near-optimal trajectories. However, it often requires either large training datasets or costly collection of human demonstrations. This work proposes an alternative approach that quickly generates smooth, near-optimal collision-free 3D Cartesian trajectories from a single artificial demonstration. The demonstration is encoded as a Dynamic Movement Primitive (DMP) and iteratively reshaped using policy-based reinforcement learning to create a diverse trajectory dataset for varying obstacle configurations. This dataset is used to train a neural network that takes as inputs the task parameters describing the obstacle dimensions and location, derived automatically from a point cloud, and outputs the DMP parameters that generate the trajectory. The approach is validated in simulation and real-robot experiments, outperforming a RRT-Connect baseline in terms of computation and execution time, as well as trajectory length, while supporting multi-modal trajectory generation for different obstacle geometries and end-effector dimensions. Videos and the implementation code are available at https://github.com/DominikUrbaniak/obst-avoid-dmp-pi2.
翻译:基于学习的运动规划能够快速生成近似最优的轨迹。然而,该方法通常需要大量训练数据集或代价高昂的人类示教数据采集。本文提出一种替代方案,能够基于单次人工示教快速生成平滑、近似最优且无碰撞的三维笛卡尔空间轨迹。该示教轨迹被编码为动态运动基元,并通过基于策略的强化学习进行迭代重塑,从而为不同障碍物配置创建多样化的轨迹数据集。该数据集用于训练一个神经网络,其输入为描述障碍物尺寸与位置的任务参数(从点云中自动提取),输出为生成轨迹所需的动态运动基元参数。该方法在仿真与真实机器人实验中均得到验证,在计算时间、执行时间及轨迹长度方面均优于RRT-Connect基线算法,同时支持针对不同障碍物几何形状与末端执行器尺寸的多模态轨迹生成。实验视频与实现代码发布于 https://github.com/DominikUrbaniak/obst-avoid-dmp-pi2。