ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems

Current agentic frameworks underperform on long-horizon tasks. As reasoning depth increases, sequential orchestration becomes brittle, context windows impose hard limits that degrade performance, and opaque execution traces make failures difficult to localize or debug. We introduce ROMA (Recursive Open Meta-Agents), a domain-agnostic framework that addresses these limitations through recursive task decomposition and structured aggregation. ROMA decomposes goals into dependency-aware subtask trees that can be executed in parallel, while aggregation compresses and validates intermediate results to control context growth. Our framework standardizes agent construction around four modular roles --Atomizer (which decides whether a task should be decomposed), Planner, Executor, and Aggregator -- which cleanly separate orchestration from model selection and enable transparent, hierarchical execution traces. This design supports heterogeneous multi-agent systems that mix models and tools according to cost, latency, and capability. To adapt ROMA to specific tasks without fine-tuning, we further introduce GEPA$+$, an improved Genetic-Pareto prompt proposer that searches over prompts within ROMA's component hierarchy while preserving interface contracts. We show that ROMA, combined with GEPA+, delivers leading system-level performance on reasoning and long-form generation benchmarks. On SEAL-0, which evaluates reasoning over conflicting web evidence, ROMA instantiated with GLM-4.6 improves accuracy by 9.9\% over Kimi-Researcher. On EQ-Bench, a long-form writing benchmark, ROMA enables DeepSeek-V3 to match the performance of leading closed-source models such as Claude Sonnet 4.5. Our results demonstrate that recursive, modular agent architectures can scale reasoning depth while remaining interpretable, flexible, and model-agnostic.

翻译：现有智能体框架在长视野任务中表现欠佳。随着推理深度的增加，顺序编排变得脆弱，上下文窗口施加的硬性限制导致性能下降，且不透明的执行轨迹使得故障难以定位或调试。我们提出ROMA（递归开放元智能体），这是一个领域无关的框架，通过递归任务分解与结构化聚合来解决这些局限性。ROMA将目标分解为可并行执行的依赖感知子任务树，而聚合过程则压缩并验证中间结果以控制上下文增长。我们的框架围绕四个模块化角色——原子化器（决定任务是否应被分解）、规划器、执行器与聚合器——标准化了智能体构建，清晰地将编排逻辑与模型选择分离，并实现了透明化的分层执行轨迹。该设计支持异构多智能体系统，能够根据成本、延迟与能力混合使用不同模型与工具。为了在不进行微调的情况下使ROMA适配特定任务，我们进一步提出GEPA$+$，这是一种改进的遗传-帕累托提示提议器，可在保持接口契约的前提下，在ROMA的组件层次结构内搜索提示方案。我们证明，ROMA与GEPA+相结合，在推理与长文本生成基准测试中实现了领先的系统级性能。在评估冲突网络证据推理的SEAL-0基准上，基于GLM-4.6实例化的ROMA将准确率较Kimi-Researcher提升了9.9%。在长文本写作基准EQ-Bench中，ROMA使DeepSeek-V3能够达到Claude Sonnet 4.5等领先闭源模型的性能水平。我们的结果表明，递归式、模块化的智能体架构能够在保持可解释性、灵活性与模型无关性的同时，扩展推理深度。