This paper introduces the N-Way Self-Evaluating Deliberation (NSED) protocol, a Runtime Mixture-of-Models (MoM) architecture that constructs emergent composite models from a plurality of distinct expert agents. Unlike traditional Mixture-of-Experts (MoE) which rely on static gating networks, NSED employs a Dynamic Expertise Broker - a runtime optimization engine that treats model selection as a variation of the Knapsack Problem, binding heterogeneous checkpoints to functional roles based on live telemetry and cost constraints. At the execution layer, we formalize deliberation as a Macro-Scale Recurrent Neural Network (RNN), where the consensus state loops back through a semantic forget gate to enable iterative refinement without proportional VRAM scaling. Key components include an orchestration fabric for trustless N-to-N peer review, a Quadratic Voting activation function for non-linear consensus, and a feedback-driven state update. Empirical validation on challenging benchmarks (AIME 2025, LiveCodeBench) demonstrates that this topology allows ensembles of small (less than 20B) consumer-grade models to match or exceed the performance of state-of-the-art 100B+ parameter models, establishing a new hardware arbitrage efficiency frontier. Furthermore, testing on the DarkBench safety suite reveals intrinsic alignment properties, with peer-mediated correction reducing sycophancy scores below that of any individual agent.
翻译:本文提出N路自评估协商协议,这是一种运行时混合模型架构,能够从多个不同的专家智能体中构建出涌现式复合模型。与传统依赖静态门控网络的混合专家模型不同,NSED采用动态专长代理器——一个将模型选择视为背包问题变体的运行时优化引擎,根据实时遥测数据和成本约束将异构检查点绑定至功能角色。在执行层,我们将协商形式化为宏观尺度循环神经网络,其共识状态通过语义遗忘门循环反馈,从而实现迭代优化且无需按比例扩展显存。关键组件包括:用于去信任N对N同行评审的编排框架、实现非线性共识的二次投票激活函数,以及反馈驱动的状态更新机制。在挑战性基准测试上的实证验证表明,该拓扑结构能使小型消费级模型组成的集成系统达到或超越百亿参数级最先进模型的性能,从而确立了新的硬件套利效率边界。此外,在DarkBench安全测试套件上的评估揭示了其内在对齐特性,同行介导的校正机制将谄媚倾向分数降低至所有个体智能体水平以下。