Rollback recovery strategies are well-known in concurrent and distributed systems. In this context, recovering from unexpected failures is even more relevant given the non-deterministic nature of execution, which means that it is practically impossible to foresee all possible process interactions. In this work, we consider a message-passing concurrent programming language where processes interact through message sending and receiving, but shared memory is not allowed. In this context, we design a checkpoint-based rollback recovery strategy that does not need a central coordination. For this purpose, we extend the language with three new operators: check, commit, and rollback. Furthermore, our approach is purely asynchronous, which is an essential ingredient to developing a source-to-source program instrumentation implementing a rollback recovery strategy.
翻译:回滚恢复策略在并发和分布式系统中广为人知。在此背景下,鉴于执行过程固有的非确定性特征——即几乎无法预见所有可能的进程交互——从意外故障中恢复显得更为重要。本研究考虑一种消息传递并发编程语言,其中进程通过消息发送和接收进行交互,但不允许共享内存。基于此,我们设计了一种无需中央协调的基于检查点的回滚恢复策略。为此,我们为该语言扩展了三个新算子:check(检查)、commit(提交)和rollback(回滚)。此外,我们的方法采用纯异步机制,这是实现基于源代码到源代码程序插桩的回滚恢复策略的关键要素。