ROMA: Recursive Open Meta-Agent Framework for Long-Horizon Multi-Agent Systems

Current agentic frameworks underperform on long-horizon tasks. As reasoning depth increases, sequential orchestration becomes brittle, context windows impose hard limits that degrade performance, and opaque execution traces make failures difficult to localize or debug. We introduce ROMA (Recursive Open Meta-Agents), a domain-agnostic framework that addresses these limitations through recursive task decomposition and structured aggregation. ROMA decomposes goals into dependency-aware subtask trees that can be executed in parallel, while aggregation compresses and validates intermediate results to control context growth. Our framework standardizes agent construction around four modular roles --Atomizer (which decides whether a task should be decomposed), Planner, Executor, and Aggregator -- which cleanly separate orchestration from model selection and enable transparent, hierarchical execution traces. This design supports heterogeneous multi-agent systems that mix models and tools according to cost, latency, and capability. To adapt ROMA to specific tasks without fine-tuning, we further introduce GEPA$+$, an improved Genetic-Pareto prompt proposer that searches over prompts within ROMA's component hierarchy while preserving interface contracts. We show that ROMA, combined with GEPA+, delivers leading system-level performance on reasoning and long-form generation benchmarks. On SEAL-0, which evaluates reasoning over conflicting web evidence, ROMA instantiated with GLM-4.6 improves accuracy by 9.9\% over Kimi-Researcher. On EQ-Bench, a long-form writing benchmark, ROMA enables DeepSeek-V3 to match the performance of leading closed-source models such as Claude Sonnet 4.5. Our results demonstrate that recursive, modular agent architectures can scale reasoning depth while remaining interpretable, flexible, and model-agnostic.

翻译：当前智能体框架在长视野任务中表现欠佳。随着推理深度的增加，顺序编排变得脆弱，上下文窗口施加了导致性能下降的硬性限制，且不透明的执行轨迹使得故障难以定位或调试。我们提出了ROMA（递归开放元智能体），这是一个领域无关的框架，通过递归任务分解和结构化聚合来解决这些限制。ROMA将目标分解为可并行执行的依赖感知子任务树，同时聚合过程压缩并验证中间结果以控制上下文增长。我们的框架围绕四个模块化角色——原子化器（决定任务是否应被分解）、规划器、执行器和聚合器——标准化了智能体构建，从而清晰地将编排与模型选择分离，并实现了透明的分层执行轨迹。该设计支持异构多智能体系统，可根据成本、延迟和能力混合使用不同模型和工具。为了在不进行微调的情况下使ROMA适应特定任务，我们进一步引入了GEPA$+$，这是一种改进的遗传-帕累托提示提议器，它在ROMA的组件层次结构内搜索提示，同时保持接口契约。我们证明，ROMA与GEPA+相结合，在推理和长文本生成基准测试中提供了领先的系统级性能。在评估冲突网络证据推理的SEAL-0上，基于GLM-4.6实例化的ROMA将准确率比Kimi-Researcher提高了9.9%。在长文本写作基准EQ-Bench上，ROMA使DeepSeek-V3能够匹配Claude Sonnet 4.5等领先闭源模型的性能。我们的结果表明，递归的模块化智能体架构可以在保持可解释性、灵活性和模型无关性的同时，扩展推理深度。