M$^{\text{2}}$XFP：一种用于高效低位量化的元数据增强微缩放数据格式 (M$^{\text{2}}$XFP: A Metadata-Augmented Microscaling Data Format for Efficient Low-bit Quantization)

Existing low-bit Microscaling (MX) formats, such as MXFP4, often suffer from substantial accuracy degradation due to the use of a shared scaling factor with the Power-of-Two format. In this work, we explore strategies that introduce minimal metadata to recover accuracy lost during quantization while maintaining high bit efficiency across a wide range of large language models. We propose a complete algorithm-hardware co-design based on flexible metadata, featuring an online quantization with simple encoding. To support the proposed method efficiently, we implement a lightweight hardware unit and integrate it into the accelerator. Evaluation results demonstrate that our method substantially narrows the accuracy gap, achieving on average a 70.63% reduction in accuracy loss compared to MXFP4 and a 37.30% reduction relative to the latest NVFP4 on LLM benchmarks. Furthermore, our design delivers up to 1.91$\times$ speedup and 1.75$\times$ energy savings over state-of-the-art accelerators. Our code is available at https://github.com/SJTU-ReArch-Group/M2XFP_ASPLOS26.

翻译：现有的低位微缩放（MX）格式，例如 MXFP4，由于采用与二次幂格式共享缩放因子的方式，常常遭受显著的精度损失。在本工作中，我们探索了引入最小化元数据的策略，以恢复量化过程中丢失的精度，同时在广泛的大型语言模型中保持高比特效率。我们提出了一种基于灵活元数据的完整算法-硬件协同设计，其特点是采用简单编码的在线量化。为了高效支持所提出的方法，我们实现了一个轻量级硬件单元并将其集成到加速器中。评估结果表明，我们的方法显著缩小了精度差距，在 LLM 基准测试中，与 MXFP4 相比，平均减少了 70.63% 的精度损失，相对于最新的 NVFP4 则减少了 37.30%。此外，与最先进的加速器相比，我们的设计实现了高达 1.91$\times$ 的加速比和 1.75$\times$ 的能耗节省。我们的代码可在 https://github.com/SJTU-ReArch-Group/M2XFP_ASPLOS26 获取。

相关内容

元数据

关注 7

元数据（Metadata），又称元数据、中介数据、中继数据[来源请求]，为描述数据的数据（data about data），主要是描述数据属性（property）的信息，用来支持如指示存储位置、历史数据、资源查找、文件纪录等功能。元数据算是一种电子式目录，为了达到编制目录的目的，必须在描述并收藏数据的内容或特色，进而达成协助数据检索的目的。

参数高效微调方法有哪些？岭大等最新《预训练语言模型的参数高效微调》综述，

专知会员服务

70+阅读 · 2023年12月21日

【牛津大学博士论文】深度学习中模型和数据的压缩，160页pdf

专知会员服务

83+阅读 · 2023年4月25日

【斯坦福大学博士论文】凸优化和图算法的新基元，404页pdf

专知会员服务

62+阅读 · 2022年8月18日

【ICML2022】Sharp-MAML:锐度感知的模型无关元学习

专知会员服务

17+阅读 · 2022年6月10日