This study is about the implementation of a reinforcement learning algorithm in the trajectory planning of manipulators. We have a 7-DOF robotic arm to pick and place the randomly placed block at a random target point in an unknown environment. The obstacle is randomly moving which creates a hurdle in picking the object. The objective of the robot is to avoid the obstacle and pick the block with constraints to a fixed timestamp. In this literature, we have applied a deep deterministic policy gradient (DDPG) algorithm and compared the model's efficiency with dense and sparse rewards.
翻译:本研究探讨了强化学习算法在机械臂轨迹规划中的应用。我们采用7自由度机械臂,在未知环境中将随机放置的物块拾取并放置于随机目标点。环境中存在随机移动的障碍物,对物块拾取构成阻碍。机器人需在固定时间约束下避开障碍物并完成物块拾取。本文应用了深度确定性策略梯度(DDPG)算法,并对比了密集奖励与稀疏奖励两种模式下的模型效率。