The Hardware Lottery posits that research directions are dictated by available silicon compute platforms. We identify a derivative phenomenon, the Hyperscale Lottery, where model architectures are optimized for cloud throughput at the expense of algorithmic efficiency. While State-Space Models (SSMs) such as Mamba were lauded for their linear complexity, ideal for edge intelligence, their evolution from Mamba-1 to Mamba-3 reveals a systematic divergence from edge-native efficiency. We demonstrate that Mamba-3's architectural changes, designed to saturate hyperscale GPUs, impose a significant edge penalty: a 28% latency increase at 880M parameters, worsening to 48% for 15M-parameter models. We argue for decoupling cloud-scale saturation strategies from core architectural design to preserve the viability of single-user, real-time edge intelligence.
翻译:硬件抽彩现象指出,研究方向的走向受限于现有的硅计算平台。我们识别出一个衍生现象——超大规模抽彩,即模型架构为牺牲算法效率以优化云端吞吐量。尽管如Mamba等状态空间模型因其线性复杂度而备受推崇,尤其适合边缘智能场景,但从Mamba-1到Mamba-3的演进过程揭示了其与边缘原生效率的系统性偏离。我们证明,Mamba-3为饱和超大规模GPU而设计的架构变更,对边缘端造成了显著的性能损失:880M参数规模下延迟增加28%,在15M参数的小模型中恶化至48%。我们主张将云端饱和策略与核心架构设计解耦,以维持单用户实时边缘智能的可行性。