This paper introduces HiFloat4 (HiF4), a block floating-point data format tailored for deep learning. Each HiF4 unit packs 64 4-bit elements with 32 bits of shared scaling metadata, averaging 4.5 bits per value. The metadata specifies a three-level scaling hierarchy, capturing inter- and intra-group dynamic range while improving the utilization of the representational space. In addition, the large 64-element group size enables matrix multiplications to be executed in a highly fixed-point manner, significantly reducing hardware area and power consumption. To evaluate the proposed format, we conducted inference experiments on several language models, including LLaMA, Qwen, Mistral, DeepSeek-V3.1 and LongCat. Results show that HiF4 achieves higher average accuracy than the state-of-the-art NVFP4 format across multiple models and diverse downstream tasks.
翻译:本文提出HiFloat4(HiF4),一种专为深度学习设计的块浮点数据格式。每个HiF4单元包含64个4比特元素及32比特共享缩放元数据,平均每个数值占用4.5比特。元数据定义三级缩放层级,既能捕获组间与组内动态范围,又可提升表示空间的利用率。此外,64元素的较大分组规模使矩阵乘法能以高度定点化的方式执行,显著降低硬件面积与功耗。为评估所提格式,我们在LLaMA、Qwen、Mistral、DeepSeek-V3.1及LongCat等多个语言模型上开展推理实验。结果表明,在多种模型与多样化下游任务中,HiF4相比当前最优的NVFP4格式实现了更高的平均精度。