The replicability crisis in the social, behavioral, and data sciences has led to the formulation of algorithm frameworks for replicability -- i.e., a requirement that an algorithm produce identical outputs (with high probability) when run on two different samples from the same underlying distribution. While still in its infancy, provably replicable algorithms have been developed for many fundamental tasks in machine learning and statistics, including statistical query learning, the heavy hitters problem, and distribution testing. In this work we initiate the study of replicable reinforcement learning, providing a provably replicable algorithm for parallel value iteration, and a provably replicable version of R-max in the episodic setting. These are the first formal replicability results for control problems, which present different challenges for replication than batch learning settings.
翻译:社会科学、行为科学和数据科学中的可复现性危机,催生了针对可复现性的算法框架——即要求算法在从同一基础分布中抽取的两个不同样本上运行时(以高概率)产生相同输出。尽管该框架仍处于起步阶段,但已针对机器学习和统计中的多项基础任务开发了可证明可复现的算法,包括统计查询学习、重击问题及分布检验。本文首次开展可复现强化学习研究,提供了并行值迭代的可证明可复现算法,以及情景式设定下R-max算法的可证明可复现版本。这些是控制问题领域首批形式化可复现性成果,该领域向批处理学习环境提出了不同的复现挑战。