Curiosity is one of the main motives in many of the natural creatures with measurable levels of intelligence for exploration and, as a result, more efficient learning. It makes it possible for humans and many animals to explore efficiently by searching for being in states that make them surprised with the goal of learning more about what they do not know. As a result, while being curious, they learn better. In the machine learning literature, curiosity is mostly combined with reinforcement learning-based algorithms as an intrinsic reward. This work proposes an algorithm based on the drive of curiosity for autonomous learning to control by generating proper motor speeds from odometry data. The quadcopter controlled by our proposed algorithm can pass through obstacles while controlling the Yaw direction of the quad-copter toward the desired location. To achieve that, we also propose a new curiosity approach based on prediction error. We ran tests using on-policy, off-policy, on-policy plus curiosity, and the proposed algorithm and visualized the effect of curiosity in evolving exploration patterns. Results show the capability of the proposed algorithm to learn optimal policy and maximize reward where other algorithms fail to do so.
翻译:好奇心是许多具有可测量智能水平的自然生物进行探索的主要动机之一,因此也能实现更高效的学习。它使人类和许多动物能够通过寻找令其惊讶的状态来高效探索,目的是更多地了解未知事物。因此,在好奇心的驱动下,它们能学得更好。在机器学习文献中,好奇心主要作为内在奖励与基于强化学习的算法相结合。本文提出了一种基于好奇心驱动的自主学习控制算法,该算法通过里程计数据生成合适的电机转速。由我们提出的算法控制的四旋翼飞行器能够在控制偏航角朝向目标位置的同时穿越障碍物。为此,我们还提出了一种基于预测误差的新型好奇心方法。我们采用在策略、离策略、在策略加好奇心以及所提出算法进行了测试,并可视化了好奇心对探索模式演变的影响。结果表明,所提出的算法能够学习最优策略并最大化奖励,而其他算法则无法实现。