Bugs in popular distributed protocol implementations have been the source of many downtimes in popular internet services. We describe a randomized testing approach for distributed protocol implementations based on reinforcement learning. Since the natural reward structure is very sparse, the key to successful exploration in reinforcement learning is reward augmentation. We show two different techniques that build on one another. First, we provide a decaying exploration bonus based on the discovery of new states -- the reward decays as the same state is visited multiple times. The exploration bonus captures the intuition from coverage-guided fuzzing of prioritizing new coverage points; in contrast to other schemes, we show that taking the maximum of the bonus and the Q-value leads to more effective exploration. Second, we provide waypoints to the algorithm as a sequence of predicates that capture interesting semantic scenarios. Waypoints exploit designer insight about the protocol and guide the exploration to ``interesting'' parts of the state space. Our reward structure ensures that new episodes can reliably get to deep interesting states even without execution caching. We have implemented our algorithm in Go. Our evaluation on three large benchmarks (RedisRaft, Etcd, and RSL) shows that our algorithm can significantly outperform baseline approaches in terms of coverage and bug finding.
翻译:流行分布式协议实现中的缺陷已成为许多互联网服务中断的根源。本文提出一种基于强化学习的分布式协议实现随机测试方法。由于自然奖励结构极为稀疏,强化学习中成功探索的关键在于奖励增强。我们展示了两种相互补充的技术:首先,我们提供基于新状态发现的衰减探索奖励——当同一状态被多次访问时奖励会衰减。该探索奖励借鉴了覆盖引导模糊测试中优先处理新覆盖点的思想;与其他方案相比,我们证明采用奖励值与Q值取最大值的策略能实现更有效的探索。其次,我们通过捕获特定语义场景的谓词序列为算法提供路径点。路径点利用设计者对协议的领域知识,将探索引导至状态空间中"有意义"的区域。我们的奖励结构确保即使没有执行缓存,新训练周期也能可靠地到达深层有意义状态。我们已在Go语言中实现该算法。在三个大型基准测试(RedisRaft、Etcd和RSL)上的评估表明,本算法在覆盖率和缺陷发现能力方面显著优于基线方法。