Modern buffer pools must now support a broader workload mix than classic OLTP alone. In addition to B-tree lookups, database systems increasingly serve scan-heavy analytics and vector-search indexes with irregular high-fan-out graph traversal access patterns. These workloads require a translation mechanism -- mapping logical page IDs to resident frames -- that is simultaneously fast across these diverse access patterns, deployable in user space,compatible with huge pages, easy to integrate, and still under DBMS control for eviction and I/O. Existing designs satisfy only subsets of these goals. This paper presents \textbf{\calico}, a practical DBMS-controlled buffer pool built around array-based translation, a decades-old-idea that was dissmissed but now viable with modern hardware. \calico decouples logical translation from OS page tables so that the DBMS can combine low-overhead translation with huge-page-backed frames and fine-grained page management. To make array translation practical and performant for DBMSes with large sparse hierarchical page identifiers, \calico introduces three techniques: multi-level translation with path caching, hole punching for reclaiming cold translation memory, and group prefetch to exploit parallelism. Our evaluation across scans, OLTP-style B-tree accesses, and vector search shows that \calico matches or outperforms the existing state-of-the-art in-memory and out-of-memory performance. We also implement \calico as a drop-in replacement for PostgreSQL's buffer manager and integrate it with \texttt{pgvector}. Across vector search, and scan-heavy workloads, \calico delivers up to 3.9$\times$ in-memory and 6.5$\times$ larger-than-memory speedup for PostgreSQL vector search, speeds up scan-heavy queries by up to 3$\times$.
翻译:现代缓冲池必须支持比传统OLTP更广泛的工作负载组合。除了B树查找外,数据库系统日益需要处理扫描密集型分析任务以及具有不规则高扇出图遍历访问模式的向量搜索索引。这些工作负载需要一种地址转换机制——将逻辑页面ID映射到驻留帧——该机制需同时满足以下要求:跨多样化访问模式保持高速、可在用户空间部署、兼容大页面、易于集成,并仍受DBMS控制以支持逐出和I/O操作。现有设计仅能实现这些目标的子集。本文提出**\textbf{\calico}**,一种基于数组地址转换构建的实用型DBMS控制缓冲池——这一数十年前被否决的理念如今在现代硬件上成为可行。\calico将逻辑地址转换与操作系统页表解耦,使DBMS能够结合低开销地址转换、大页面支持的帧以及细粒度页面管理。为使数组地址转换对具有大规模稀疏层次页面ID的DBMS实用且高效,\calico引入三项技术:带路径缓存的多级地址转换、用于回收冷地址转换内存的"打孔"机制,以及利用并行性的组预取。我们在扫描、OLTP型B树访问及向量搜索场景下的评估表明,\calico在内存内和内存外性能上均达到或超越现有最先进水平。我们还将\calico作为PostgreSQL缓冲管理器的即插即用替代方案实现,并与\texttt{pgvector}集成。在向量搜索和扫描密集型工作负载中,\calico为PostgreSQL向量搜索提供了高达3.9倍的内存内加速和6.5倍的内存外加速,并将扫描密集型查询速度提升高达3倍。