This paper introduces HiFloat4 (HiF4), a block floating-point data format tailored for deep learning. Each HiF4 unit packs 64 4-bit elements with 32 bits of shared scaling metadata, averaging 4.5 bits per value. The metadata specifies a three-level scaling hierarchy, capturing inter- and intra-group dynamic range while improving the utilization of the representational space. In addition, the large 64-element group size enables matrix multiplications to be executed in a highly fixed-point manner, significantly reducing hardware area and power consumption. To evaluate the proposed format, we conducted inference experiments on several language models, including LLaMA, Qwen, Mistral, DeepSeek-V3.1 and LongCat. Results show that HiF4 achieves higher average accuracy than the state-of-the-art NVFP4 format across multiple models and diverse downstream tasks.
翻译:本文提出HiFloat4(HiF4),一种专为深度学习设计的块浮点数据格式。每个HiF4单元打包64个4位元素,并附带32位共享缩放元数据,平均每个数值占用4.5位。元数据定义了一个三级缩放层次结构,既能捕获组间与组内的动态范围,又提高了表示空间的利用率。此外,64个元素的大组尺寸使得矩阵乘法能以高度定点化的方式执行,显著降低了硬件面积和功耗。为评估所提格式,我们在多个语言模型(包括LLaMA、Qwen、Mistral、DeepSeek-V3.1和LongCat)上进行了推理实验。结果表明,在多种模型和多样下游任务中,HiF4的平均精度优于当前最先进的NVFP4格式。