Branchable databases are evolving from developer tools to infrastructure for agentic workloads characterized by speculative mutations and non-linear state exploration. Traditional RDBMS mechanisms such as nested transactions do not provide the persistent isolation and concurrent branch management required by autonomous agents, and recent "zero-copy" designs make different trade-offs whose impact on agentic workloads remains unclear. To clarify this space, we present BranchBench, a benchmark for evaluating branchable relational DBMSes under agentic exploration. We characterize five representative workloads-agentic software engineering, failure reproduction, data curation, MCTS, and simulation-and design parameterized macrobenchmarks that execute branch-mutate-evaluate loops to reflect these workloads, along with microbenchmarks that isolate branch lifecycle costs. We evaluate state of the art systems including Neon, DoltgreSQL, Tiger Data, Xata, and PostgreSQL baselines, and find a fundamental tension: systems optimized for fast branching suffer up to 5-4000x slower reads as branches deepen, while systems optimized for fast data operations incur 25-1500x higher branch creation and switching latency. Further, no current system supports the representative workloads at scale. These results highlight the need for branch-native DBMSes designed specifically for agentic exploration.
翻译:可分支数据库正从开发者工具演变为支持智能体工作负载的基础设施,这类工作负载具有推测性突变和非线性状态探索的特征。传统关系型数据库管理系统(RDBMS)机制(如嵌套事务)无法提供自主智能体所需的持久隔离与并发分支管理,而近期提出的"零拷贝"设计方案虽引入不同的权衡取舍,但其对智能体工作负载的影响尚不明确。为厘清这一领域,我们提出BranchBench——一个用于评估可分支关系型DBMS在智能体探索场景下性能的基准测试。我们刻画了五类代表性工作负载(智能体软件工程、故障复现、数据整理、蒙特卡洛树搜索与仿真),并设计参数化宏基准测试,通过执行"分支-突变-评估"循环来模拟这些负载,同时配套设计微基准测试以独立评估分支生命周期成本。我们对包括Neon、DoltgreSQL、Tiger Data、Xata及PostgreSQL基线在内的现有系统进行评估后发现根本性的矛盾:面向快速分支优化的系统在分支深度增加时读取性能下降5-4000倍,而面向快速数据操作优化的系统分支创建与切换延迟则增加25-1500倍。此外,现有系统均无法规模化支持代表性工作负载。这些结果凸显了专门为智能体探索设计的分支原生DBMS的必要性。