On the Limits of Causal Observation in Shared-Memory Systems

Determining whether one concurrent operation completed before another began is a fundamental prerequisite for reasoning about the correctness of concurrent systems. We formalize this challenge as the Causal Observability Problem (COP): assign timestamps to the observable boundary events of a concurrent execution, invocations and responses, that faithfully reflect real-time operation order. A solution is complete if it never misses a genuine precedence, and sound if it never reports a spurious one. We prove that a strongly consistent solution, one that is simultaneously complete and sound, is unachievable at the observable boundary. We then show that the placement of instrumentation events relative to operation boundaries deterministically governs what a monitor can guarantee: internal placement yields completeness, external placement yields soundness, and neither achieves both. This dichotomy holds independently of the underlying timestamp mechanism. We instantiate this framework with three non-blocking implementations of a Causal Monitor object: FAInc (centralized atomic counter), Striped (decentralized counter), and Collect (iterative register snapshot). FAInc and Striped are linearizable; Collect is only quiescently consistent. Despite this internal consistency gap, we prove that all three provide identical COP guarantees: placement alone determines observable behavior. We validate these claims empirically on a 64-core NUMA architecture, showing that Striped matches Collect in throughput while preserving linearizability, resolving the cache-contention bottleneck of FAInc at high thread counts.

翻译：确定一个并发操作是否在另一个操作开始之前完成，是推理并发系统正确性的基本前提。我们将这一挑战形式化为因果可观测性问题（COP）：为并发执行的可观测边界事件（即调用与响应）分配时间戳，使其忠实反映操作的实时顺序。若一个解法能捕获所有真实的前序关系，则称为完备的；若从不报告虚假的前序关系，则称为可靠的。我们证明，在可观测边界上无法实现同时具备完备性与可靠性的强一致性解法。进一步表明，检测事件相对于操作边界的位置决定了监测器所能保证的性质：内部布局保证完备性，外部布局保证可靠性，而二者无法兼得。这一二分法独立于底层时间戳机制。我们通过三种因果监测器对象的非阻塞实现来实例化该框架：FAInc（集中式原子计数器）、Striped（去中心化计数器）与Collect（迭代寄存器快照）。其中FAInc与Striped是可线性化的，而Collect仅满足静默一致性。尽管存在这种内部一致性差距，我们证明三者提供的COP保证完全相同：仅布局决定可观测行为。我们在64核NUMA架构上通过实验验证了这些结论，表明Striped在保持可线性化的同时，吞吐量可与Collect媲美，并解决了高线程数下FAInc的缓存竞争瓶颈。