KAPSO: A Knowledge-grounded framework for Autonomous Program Synthesis and Optimization

We introduce KAPSO, a modular framework for autonomous program synthesis and optimization. Given a natural language goal and an evaluation method, KAPSO iteratively performs ideation, code synthesis and editing, execution, evaluation, and learning to improve a runnable artifact toward measurable objectives. Rather than treating synthesis as the endpoint, KAPSO uses synthesis as an operator within a long-horizon optimization loop, where progress is defined by evaluator outcomes. KAPSO targets long-horizon failures common in coding agents, including lost experimental state, brittle debugging, and weak reuse of domain expertise, by integrating three tightly coupled components. First, a git-native experimentation engine isolates each attempt as a branch, producing reproducible artifacts and preserving provenance across iterations. Second, a knowledge system ingests heterogeneous sources, including repositories, internal playbooks, and curated external resources such as documentation, scientific papers, and web search results, and organizes them into a structured representation that supports retrieval over workflows, implementations, and environment constraints. Third, a cognitive memory layer coordinates retrieval and maintains an episodic store of reusable lessons distilled from experiment traces (run logs, diffs, and evaluator feedback), reducing repeated error modes and accelerating convergence. We evaluated KAPSO on MLE-Bench (Kaggle-style ML competitions) and ALE-Bench (AtCoder heuristic optimization), and report end-to-end performance. Code Available at: https://github.com/Leeroo-AI/kapso

翻译：我们提出KAPSO，一个用于自主程序合成与优化的模块化框架。给定自然语言目标和评估方法，KAPSO通过迭代执行构思、代码合成与编辑、执行、评估和学习，逐步改进可运行工件以达成可量化的目标。KAPSO不将合成视为终点，而是将其作为长周期优化循环中的操作符，其中进展由评估器结果定义。针对编码智能体中常见的长周期故障（包括实验状态丢失、调试过程脆弱以及领域专业知识复用能力弱），KAPSO通过集成三个紧密耦合的组件来解决这些问题。首先，基于git的实验引擎将每次尝试隔离为独立分支，生成可复现的工件并保留迭代过程中的溯源信息。其次，知识系统整合多源异构数据（包括代码仓库、内部操作手册以及文档、科学论文和网络搜索结果等精选外部资源），并将其组织成支持工作流、实现方案和环境约束检索的结构化表示。第三，认知记忆层协调检索过程，并维护从实验轨迹（运行日志、代码差异和评估反馈）中提炼的可复用经验片段存储，从而减少重复错误模式并加速收敛。我们在MLE-Bench（Kaggle风格机器学习竞赛）和ALE-Bench（AtCoder启发式优化）上评估KAPSO，并报告端到端性能。代码发布于：https://github.com/Leeroo-AI/kapso