The replicability crisis in the social, behavioral, and data sciences has led to the formulation of algorithm frameworks for replicability -- i.e., a requirement that an algorithm produce identical outputs (with high probability) when run on two different samples from the same underlying distribution. While still in its infancy, provably replicable algorithms have been developed for many fundamental tasks in machine learning and statistics, including statistical query learning, the heavy hitters problem, and distribution testing. In this work we initiate the study of replicable reinforcement learning, providing a provably replicable algorithm for parallel value iteration, and a provably replicable version of R-max in the episodic setting. These are the first formal replicability results for control problems, which present different challenges for replication than batch learning settings.
翻译:社会科学、行为科学及数据科学中的可复现性危机,促使研究者提出了算法可复现性的框架——即要求算法在基于同一潜在分布的两个不同样本上运行时,能以高概率产生相同的输出结果。尽管该领域尚处于早期阶段,但已在机器学习与统计学的多项基础任务中开发出可证明可复现的算法,包括统计查询学习、重击问题(heavy hitters problem)以及分布测试。本研究首次开创了对可复现强化学习的探索,提出了并行价值迭代的可证明可复现算法,以及基于场景设置(episodic setting)下R-max算法的可证明可复现版本。这是控制问题领域首次获得形式化的可复现性成果,此类问题在复现过程中面临的挑战与批处理学习场景截然不同。