We provide a new non-asymptotic analysis of distributed temporal difference learning with linear function approximation. Our approach relies on ``one-shot averaging,'' where $N$ agents run identical local copies of the TD(0) method and average the outcomes only once at the very end. We demonstrate a version of the linear time speedup phenomenon, where the convergence time of the distributed process is a factor of $N$ faster than the convergence time of TD(0). This is the first result proving benefits from parallelism for temporal difference methods.
翻译:我们提出了一种新的非渐近分析方法,用于分析带有线性函数近似的分布式时序差分学习。该方法基于“单次平均”策略,即N个智能体独立运行完全相同的本地TD(0)算法副本,仅在最终阶段进行一次结果平均。我们证明了线性时间加速现象的存在:分布式过程的收敛速度比单个TD(0)算法快N倍。这是首个证明并行计算能为时序差分方法带来加速效果的理论结果。