The typical Selective State-Space Model (SSM) used in Mamba addresses several limitations of Transformers, such as the quadratic computational complexity with respect to sequence length and the significant memory requirements during inference due to the key-value (KV) cache. However, the increasing size of Mamba models continues to pose challenges for training and deployment, particularly due to their substantial computational demands during both training and inference. In this work, we introduce $\texttt{Bi-Mamba}$, a scalable and powerful 1-bit Mamba architecture designed to enable more efficient large language models (LLMs), with model sizes of 780M, 1.3B, and 2.7B parameters. $\texttt{Bi-Mamba}$ models are trained from scratch on a standard LLM-scale dataset using an autoregressive distillation loss. Extensive experiments on language modeling benchmarks demonstrate that $\texttt{Bi-Mamba}$ achieves performance comparable to its full-precision (FP16 or BF16) counterparts, while outperforming post-training binarization (PTB) Mamba and binarization-aware training (BAT) Transformer baselines. Moreover, $\texttt{Bi-Mamba}$ drastically reduces memory usage and computational cost compared to the original Mamba. Our work pioneers a new line of linear-complexity LLMs under low-bit representation and provides the way for the design of specialized hardware optimized for efficient 1-bit Mamba-based models. Code and the pre-trained weights are available at https://github.com/Tangshengku/Bi-Mamba.
翻译:Mamba中使用的典型选择性状态空间模型(SSM)解决了Transformer的一些局限性,例如序列长度的二次计算复杂度以及由于键值(KV)缓存导致推理期间显著的内存需求。然而,Mamba模型规模的持续增长仍然对训练和部署构成挑战,尤其是在训练和推理期间巨大的计算需求。在这项工作中,我们引入了$\texttt{Bi-Mamba}$,一种可扩展且强大的1比特Mamba架构,旨在实现更高效的大型语言模型(LLMs),其模型参数量分别为780M、1.3B和2.7B。$\texttt{Bi-Mamba}$模型是在标准的LLM规模数据集上使用自回归蒸馏损失从头开始训练的。在语言建模基准上的大量实验表明,$\texttt{Bi-Mamba}$实现了与其全精度(FP16或BF16)对应模型相当的性能,同时优于训练后二值化(PTB)的Mamba模型和二值化感知训练(BAT)的Transformer基线模型。此外,与原始Mamba相比,$\texttt{Bi-Mamba}$显著降低了内存使用和计算成本。我们的工作开创了低比特表示下线性复杂度LLMs的新方向,并为设计针对高效1比特Mamba模型优化的专用硬件铺平了道路。代码和预训练权重可在 https://github.com/Tangshengku/Bi-Mamba 获取。