The contribution focuses on the problem of exploration within the task of knowledge transfer. Knowledge transfer refers to the useful application of the knowledge gained while learning the source task in the target task. The intended benefit of knowledge transfer is to speed up the learning process of the target task. The article aims to compare several exploration methods used within a deep transfer learning algorithm, particularly Deep Target Transfer $Q$-learning. The methods used are $\epsilon$-greedy, Boltzmann, and upper confidence bound exploration. The aforementioned transfer learning algorithms and exploration methods were tested on the virtual drone problem. The results have shown that the upper confidence bound algorithm performs the best out of these options. Its sustainability to other applications is to be checked.
翻译:本文主要研究知识迁移任务中的探索问题。知识迁移指将源任务学习过程中获得的知识有效应用于目标任务,其预期效益在于加速目标任务的学习过程。本文旨在比较深度迁移学习算法(特别是深度目标迁移$Q$学习)中使用的多种探索方法,包括$\epsilon$-贪心策略、玻尔兹曼探索以及置信上界探索。上述迁移学习算法与探索方法在虚拟无人机问题上进行了测试,结果表明置信上界算法在这些方案中表现最优,其在不同应用场景中的普适性仍有待验证。