Large language models (LLMs) have revolutionized artificial intelligence, yet their massive memory and computational demands necessitate aggressive quantization, increasingly pushing representations toward the theoretical limit of a single bit. While complex-valued LLMs, such as iFairy, offer a superior chance for low-bit representation compared to real-valued counterparts, they require training from scratch, preventing the utilization of the vast ecosystem of pre-trained real-valued foundation models. Here we present Fairy2i, a universal framework that transforms pre-trained real-valued layers into an equivalent widely-linear complex form, enabling extremely low-bit quantization while reusing existing checkpoints. By proving a lossless mathematical equivalence between real and widely-linear maps, we convert standard Transformers into the complex domain and employ a phase-aware quantization scheme with a highly efficient codebook of fourth roots of unity. Furthermore, we introduce a recursive residual quantization mechanism that iteratively minimizes quantization error, allowing inference to proceed via efficient multiplication-free accumulation. We demonstrate that Fairy2i restores the performance of LLaMA-2 7B at an effective 2-bit precision to levels nearly comparable with full-precision baselines, significantly outperforming state-of-the-art real-valued binary and ternary quantization methods. This work bridges the gap between the representational efficiency of complex-valued arithmetic and the practical utility of pre-trained models, paving a new way for efficient inference on commodity hardware.
翻译:大语言模型(LLMs)已彻底变革人工智能领域,但其巨大的内存与计算需求迫使人们采用激进的量化策略,将表示精度不断推向单比特的理论极限。尽管复数值大语言模型(如iFairy)相比实数值模型在低位表示方面具有更优潜力,但它们需要从头训练,无法利用庞大的预训练实数值基础模型生态。本文提出Fairy2i,一种通用框架,可将预训练的实数值层转换为等价的广义线性复数值形式,在复用现有模型参数的同时实现极低位数量化。通过证明实数值映射与广义线性复数值映射间的无损数学等价性,我们将标准Transformer转换至复数域,并采用基于四次单位根高效码本的相位感知量化方案。此外,我们引入递归残差量化机制,通过迭代最小化量化误差,使推理过程可通过无需乘法的累加高效执行。实验表明,Fairy2i在等效2位精度下将LLaMA-2 7B模型的性能恢复至接近全精度基线水平,显著优于当前最先进的实数值二值与三值量化方法。该工作弥合了复数值算术的表征效率与预训练模型实用价值之间的鸿沟,为在通用硬件上实现高效推理开辟了新路径。