Core systems like key-value stores have historically taken years to build, and are designed to be general so as to amortize cost across deployments, paying a significant performance cost. We argue that LLM-based coding agents now make a different approach tractable: Just-in-Time Systems, in which the entire system is synthesized from scratch, specialized to the environment, workload, and required system properties. We present a JIT system synthesis pipeline, Jitskit, and explore its effectiveness in synthesizing key-value stores from spec cards that span different YCSB workloads, deployment constraints (e.g., compute resources), and system properties (e.g., consistency and durability). Jitskit iteratively refines a system implementation to match the specification against an evolving evaluation test suite. The resulting synthesized systems are performant, beating comparable state-of-the-art systems on 18 of 18 specs tried, by up to 4.6x over the best off-the-shelf baseline on the most favorable spec. Naively running Claude Code either reward-hacks or underperforms Jitskit by up to 5.4x. We discuss the challenges we overcame in building Jitskit and our key takeaways.
翻译:核心系统(如键值存储)历来需要数年时间构建,且为通用性设计以摊薄部署成本,但需付出显著的性能代价。我们认为,基于LLM的编码代理如今使另一种方法变得可行:即时系统(Just-in-Time Systems),即从零开始合成整个系统,使其专门适配环境、工作负载及所需系统属性。我们提出JIT系统合成管道Jitskit,并探索其在从规格说明卡合成键值存储时的有效性——这些规格说明涵盖不同YCSB工作负载、部署约束(如计算资源)和系统属性(如一致性与持久性)。Jitskit通过迭代优化系统实现,使其与不断演进的评估测试套件相匹配。最终合成的系统性能卓越,在18项规格测试中均击败可比的最先进系统,在最有利的规格下性能较最佳现成基线提升高达4.6倍。而直接运行Claude Code的朴素方案要么出现奖励作弊行为,要么性能较Jitskit低至5.4倍。我们讨论了构建Jitskit过程中克服的挑战及关键经验。