The replicability crisis in the social, behavioral, and data sciences has led to the formulation of algorithm frameworks for replicability -- i.e., a requirement that an algorithm produce identical outputs (with high probability) when run on two different samples from the same underlying distribution. While still in its infancy, provably replicable algorithms have been developed for many fundamental tasks in machine learning and statistics, including statistical query learning, the heavy hitters problem, and distribution testing. In this work we initiate the study of replicable reinforcement learning, providing a provably replicable algorithm for parallel value iteration, and a provably replicable version of R-max in the episodic setting. These are the first formal replicability results for control problems, which present different challenges for replication than batch learning settings.
翻译:社会科学、行为科学和数据科学中的可复现性危机催生了针对可复现性的算法框架——即要求算法在基于同一底层分布的两个不同样本上运行时,能以高概率产生相同输出。尽管该领域尚处于起步阶段,但针对机器学习和统计学中的许多基础任务(包括统计查询学习、高频率元素识别问题以及分布测试)已开发出可证明可复现的算法。本研究首次开展可复现强化学习的探索,提出了并行值迭代的可证明可复现算法,以及情节式设置中R-max算法的可证明可复现版本。这是针对控制问题(其复现过程面临与批量学习环境不同的挑战)的首批形式化可复现性成果。