Beyond Gemini-3-Pro: Revisiting LLM Routing and Aggregation at Scale

Large Language Models (LLMs) have rapidly advanced, with Gemini-3-Pro setting a new performance milestone. In this work, we explore collective intelligence as an alternative to monolithic scaling, and demonstrate that open-source LLMs' collaboration can surpass Gemini-3-Pro. We first revisit LLM routing and aggregation at scale and identify three key bottlenecks: (1) current train-free routers are limited by a query-based paradigm focusing solely on textual similarity; (2) recent aggregation methods remain largely static, failing to select appropriate aggregators for different tasks;(3) the complementarity of routing and aggregation remains underutilized. To address these problems, we introduce JiSi, a novel framework designed to release the full potential of LLMs' collaboration through three innovations: (1) Query-Response Mixed Routing capturing both semantic information and problem difficulty; (2) Support-Set-based Aggregator Selection jointly evaluating the aggregation and domain capacity of aggregators; (3) Adaptive Routing-Aggregation Switch dynamically leveraging the advantages of routing and aggregation. Comprehensive experiments on nine benchmarks demonstrate that JiSi can surpass Gemini-3-Pro with only 47% costs by orchestrating ten open-source LLMs, while outperforming mainstream baselines. It suggests that collective intelligence represents a novel path towards Artificial General Intelligence (AGI).

翻译：大型语言模型（LLM）发展迅速，Gemini-3-Pro树立了新的性能里程碑。本研究探索了集体智能作为单一模型规模扩展的替代方案，并证明开源LLM的协作可以超越Gemini-3-Pro。我们首先重新审视了大规模LLM路由与聚合，并识别出三个关键瓶颈：（1）当前无需训练的路由器受限于仅关注文本相似度的基于查询的范式；（2）近期的聚合方法大多仍是静态的，未能为不同任务选择合适的聚合器；（3）路由与聚合的互补性仍未得到充分利用。为解决这些问题，我们提出了JiSi这一新颖框架，旨在通过三项创新充分释放LLM协作的潜力：（1）查询-响应混合路由，同时捕获语义信息和问题难度；（2）基于支持集的聚合器选择，联合评估聚合器的聚合能力与领域能力；（3）自适应路由-聚合切换，动态利用路由与聚合的优势。在九个基准测试上的综合实验表明，JiSi通过协调十个开源LLM，仅以47%的成本即可超越Gemini-3-Pro，同时优于主流基线方法。这表明集体智能代表了一条通往通用人工智能（AGI）的新路径。

相关内容

Gemini

关注 12

2023年12 月 6 日，谷歌 CEO 桑达尔・皮查伊官宣 Gemini 1.0 版正式上线。这次发布的 Gemini 大模型是原生多模态大模型，是谷歌大模型新时代的第一步，它包括三种量级：能力最强的 Gemini Ultra，适用于多任务的 Gemini Pro 以及适用于特定任务和端侧的 Gemini Nano。

利用 Gemini 加速科学研究：案例研究与常用技术

专知会员服务

17+阅读 · 3月25日

迈向LLM时代的可泛化评估：超越基准的综述

专知会员服务

23+阅读 · 2025年4月29日

大型语言模型（LLM）智能体全栈安全的综述：数据、训练与部署

专知会员服务

33+阅读 · 2025年4月23日

LLM4SR：关于大规模语言模型在科学研究中的应用综述

专知会员服务

42+阅读 · 2025年1月9日