Mixture-of-Experts (MoE) embedding models combine expert outputs using weighted linear summation, implicitly assuming a linear subspace structure in the embedding space. This assumption is shown to be inconsistent with the geometry of expert representations. Geometric analysis of a modern MoE embedding model reveals that expert outputs lie on a shared hyperspherical manifold characterized by tightly concentrated norms and substantial angular separation. Under this geometry, linear aggregation induces inward collapse toward the manifold interior, distorting vector magnitude and direction and reducing embedding comparability. To address this inconsistency, Spherical Barycentric Aggregation (SBA) is introduced as a geometry-preserving aggregation operator that separates radial and angular components to maintain hyperspherical structure while remaining fully compatible with existing routing mechanisms. Experiments on selected tasks from the Massive Text Embedding Benchmark (MTEB), including semantic similarity, clustering, and duplicate question detection, demonstrate consistent performance improvements with identical training cost and full stability. Additional geometric analyses confirm that SBA prevents aggregation-induced collapse and preserves hyperspherical consistency, highlighting the importance of geometry-aware aggregation in MoE embedding architectures.
翻译:专家混合(MoE)嵌入模型通过加权线性求和组合专家输出,其隐含假设嵌入空间具有线性子空间结构。研究表明,这一假设与专家表示的几何特性不一致。对现代MoE嵌入模型的几何分析显示,专家输出位于共享的超球面流形上,其特征是高度集中的范数和显著的角向分离。在此几何结构下,线性聚合会引发向流形内部的内向坍缩,扭曲向量模长与方向,降低嵌入可比性。为解决这一不一致性问题,本文提出球面重心聚合(SBA)作为几何保持聚合算子,通过分离径向与角向分量来维持超球面结构,同时完全兼容现有路由机制。在Massive Text Embedding Benchmark(MTEB)选取的语义相似性、聚类和重复问题检测等任务上的实验表明,该方法在保持相同训练成本和完全稳定性的同时实现了持续的性能提升。进一步的几何分析证实,SBA能有效防止聚合引发的坍缩并保持超球面一致性,凸显了几何感知聚合在MoE嵌入架构中的重要性。