We extend the notion of Cantor-Kantorovich distance between Markov chains introduced by (Banse et al., 2023) in the context of Markov Decision Processes (MDPs). The proposed metric is well-defined and can be efficiently approximated given a finite horizon. Then, we provide numerical evidences that the latter metric can lead to interesting applications in the field of reinforcement learning. In particular, we show that it could be used for forecasting the performance of transfer learning algorithms.
翻译:本文将(Banse等人,2023)提出的马尔可夫链间康托-康托罗维奇距离概念推广至马尔可夫决策过程(MDPs)。所提出的度量具有良好的定义性,且在有限时间范围内可被高效近似计算。随后,我们通过数值实验证明该度量在强化学习领域具有重要应用价值。特别地,我们展示了该度量可用于预测迁移学习算法的性能表现。