The Cascade Log: Reference-Stable Windowing over Tiered Append Sequences

from arxiv, 22 pages, 9 figures, 3 tables. Ancillary files provided: reference implementation, seeded workloads, benchmark harness, raw CSV results, and figure scripts

A long-running append-mostly sequence, such as an edit log, event store, or versioned working set, is usually tiered into a bounded hot stratum and colder folded summaries. This saves memory but breaks stable references: a handle minted while a record is hot may later be resolved after the record has moved into a digest, after it has been superseded, or while a fold is in flight. We define the resulting cross-tier anomalies--dangling, stale, corrupt, and snapshot-skewed resolution--and present the Cascade Log, a reference-stable tiered append structure. The structure keeps a single persistent coalescing interval map over handles as the sole authority on each live version; folding a contiguous run replaces many singleton entries by one digest-backed interval node, and immutable roots provide snapshot tokens. Its cost is characterized by the fragmentation $A$, the number of index pieces, namely live handles plus maximal same-digest runs. The index uses $Θ(A)$ space, resolves a point in $O(\log A)$, reports a $k$-handle range in $O(\log A+k)$, and performs $a$ appends and $s$ supersedes in $O((a/B+s)\log A)$ update work for fold block size $B$. Matching lower bounds show that $Ω(A)$ space and $Ω(\log A+k)$ ordered range cost are unavoidable, and an adversary can force $A=Θ(s)$. Thus the index is sublinear on append-dominated histories and grows linearly only under fragmenting edits. A reference implementation and reproducible experiments to $10^6$ records validate the anomaly-freedom and the fragmentation bounds.

翻译：长期运行的追加为主序列（如编辑日志、事件存储或版本化工作集）通常分层为有界热层和更冷的折叠摘要。这节省了内存但破坏了稳定引用：在记录活跃时创建的句柄可能在记录移入摘要、被替换或折叠过程中被解析。我们定义了由此产生的跨层异常——悬空解析、陈旧解析、损坏解析和快照倾斜解析——并提出了级联日志（Cascade Log），一种引用稳定的分层追加结构。该结构维护一个持久的合并区间映射，以句柄作为每个活跃版本的唯一权威；折叠连续区间将多个单例条目替换为一个摘要支持的区间节点，而不可变根提供了快照令牌。其成本由碎片化程度$A$（即索引片段数量，包括活跃句柄加上最大相同摘要的区间数）表征。索引使用$Θ(A)$空间，在$O(\log A)$时间内解析单个点，在$O(\log A+k)$时间内报告$k$个句柄的范围，对于折叠块大小$B$，执行$a$次追加和$s$次替换的更新工作量为$O((a/B+s)\log A)$。匹配的下界表明$Ω(A)$空间和$Ω(\log A+k)$的有序范围代价不可避免，而对手可迫使$A=Θ(s)$。因此索引在追加主导的历史上保持次线性，仅在碎片化编辑下线性增长。参考实现和针对$10^6$条记录的可重复实验验证了无异常性和碎片化边界。