Cloud applications often insert a caching lay\-er in front of a database in order to reduce I/O latency and improve throughput. One complication occurs when a client fetches some data from one cache node, then migrates to another (e.g., due to failures, load balancing, or client mobility), where it fetches the remaining data. If the data in the cache nodes is inconsistent, the client could observe states that undermine the application's correctness. One example of a situation where this is common is stateful serverless workflows, which consist of multiple serverless functions that access state in a remote database. In serverless, functions in the same workflow may be scheduled to different nodes with different caches, resulting in the migration pattern described above -- the same client (the workflow) reads some data from one cache and other data from another. To address this issue, this paper presents CausalMesh, a novel approach to causally consistent distributed caching in environments where computations may migrate between machines. CausalMesh is the first cache system to support coordination-free, abort-free read/write operations and read transactions when clients migrate across multiple servers. CausalMesh also supports read-write transactional causal consistency in the presence of client migration, but at the cost of abort-freedom. Our experimental evaluation shows that CausalMesh has lower latency and higher throughput than existing proposals. Finally, we have formally verified the correctness of \sys's protocol in Dafny.
翻译:云应用通常会在数据库前插入缓存层以降低I/O延迟并提高吞吐量。当客户端从某个缓存节点获取部分数据后,因故障、负载均衡或客户端移动性等原因迁移至另一节点,并继续获取其余数据时,便会出现复杂问题。若缓存节点间的数据不一致,客户端可能观察到破坏应用正确性的状态。此类场景的典型示例是有状态无服务器工作流——由多个访问远程数据库中状态的无服务器函数构成。在无服务器环境中,同一工作流的不同函数可能被调度到具有不同缓存的节点上,从而产生上述迁移模式(即同一客户端/工作流从不同缓存读取不同数据)。为解决此问题,本文提出CausalMesh——一种适用于计算可在机器间迁移的环境中的因果一致性分布式缓存创新方案。CausalMesh是首个在客户端跨多服务器迁移时支持无协调、无中止读写操作及读事务的缓存系统。在客户端迁移场景下CausalMesh仍支持读写事务因果一致性,但需以放弃无中止特性为代价。实验评估表明,CausalMesh相较于现有方案具有更低的延迟和更高的吞吐量。最后,我们通过Dafny形式化验证了该系统协议的正确性。