As the demand for deep learning grows, cost reduction through quantization has become essential for both training and inference. In 2022, the Open Compute Project (OCP) consortium standardized narrow precision formats for deep learning, called the microscaling (MX) format. The MX format is a hardware-friendly dynamic quantization scheme that effectively reduces the data size by sharing an 8-bit exponent across multiple operands. The MX format can be categorized into two types with their own strengths: (i) MXINT which focuses on a high precision consisting only of mantissa bits and (ii) MXFP which focuses on a wider dynamic range by allowing local exponent bits. In this work, we present a versatile MXFP format, called MX-SAFE (MXSF in short), that adaptively uses two modes, i.e., a wider mantissa mode (FP8 E2M5) and a subnormal FP mode (FP5 E3M2), to support both training and direct-cast inference. Furthermore, we propose a tile-based block design to increase hardware efficiency by reducing the burden of re-quantization process during the training with the MXSF format. Owing to the use of the proposed MXSF format, 0.05%/11.1% and 3.55%/3.57% improvements in accuracy, on average, for inference/full-training compared to MXFP8 E2M5 and MXFP8 E4M3 are observed, respectively. Moreover, we present a training-inference accelerator that supports the MXSF format and it achieves similar accuracy to the BF16 baseline while using 24.9% less total energy consumption.
翻译:随着深度学习需求的增长,通过量化降低成本对训练和推理都变得至关重要。2022年,开放计算项目(OCP)联盟标准化了面向深度学习的窄精度格式,称为微缩放(MX)格式。MX格式是一种硬件友好的动态量化方案,通过跨多个操作数共享8位指数有效减小数据规模。MX格式可分为两类且各有优势:(i)MXINT,专注于仅由尾数位组成的高精度;(ii)MXFP,通过允许局部指数字段实现更宽的动态范围。本文提出一种名为MX-SAFE(简称MXSF)的多功能MXFP格式,它自适应地使用两种模式——宽尾数模式(FP8 E2M5)和非规格化FP模式(FP5 E3M2)——以同时支持训练和直接转换推理。此外,我们提出基于分块的块设计,通过减轻MXSF格式训练过程中的重量化负担来提高硬件效率。采用所提出的MXSF格式后,与MXFP8 E2M5和MXFP8 E4M3相比,推理/全训练的平均精度分别提升0.05%/11.1%和3.55%/3.57%。我们还实现了支持MXSF格式的训练-推理加速器,在总能耗降低24.9%的情况下达到与BF16基线相当的精度。