Weight binarization has emerged as a promising strategy to reduce the complexity of large language models (LLMs). Existing approaches fall into post-training binarization, which is simple but causes severe performance loss, and training-aware methods, which depend on full-precision latent weights, adding complexity and limiting efficiency. We propose a novel framework that represents LLMs with multi-kernel Boolean parameters and, for the first time, enables direct finetuning LMMs in the Boolean domain, eliminating the need for latent weights. This enhances representational capacity and dramatically reduces complexity during both finetuning and inference. Extensive experiments across diverse LLMs show our method outperforms recent ultra low-bit quantization and binarization techniques.
翻译:权重二值化已成为降低大型语言模型复杂度的前沿策略。现有方法分为训练后二值化——虽简便但导致严重性能损失——以及依赖全精度隐式权重的训练感知方法,此类方法增加复杂度且限制效率。我们提出新型框架,通过多核布尔参数表征大语言模型,首次实现在布尔域中直接微调大语言模型,无需隐式权重。该方案既增强表征能力,又显著降低微调与推理过程的复杂度。针对多种大语言模型的全面实验表明,本方法性能优于近期超低位宽量化与二值化技术。