eXmY is a novel data type for quantization of ML models. It supports both arbitrary bit widths and arbitrary integer and floating point formats. For example, it seamlessly supports 3, 5, 6, 7, 9 bit formats. For a specific bit width, say 7, it defines all possible formats e.g. e0m6, e1m5, e2m4, e3m3, e4m2, e5m1 and e6m0. For non-power of two bit widths e.g. 5, 6, 7, we created a novel encoding and decoding scheme which achieves perfect compression, byte addressability and is amenable to sharding and vector processing. We implemented libraries for emulation, encoding and decoding tensors and checkpoints in C++, TensorFlow, JAX and PAX. For optimal performance, the codecs use SIMD instructions on CPUs and vector instructions on TPUs and GPUs. eXmY is also a technique and exploits the statistical distribution of exponents in tensors. It can be used to quantize weights, static and dynamic activations, gradients, master weights and optimizer state. It can reduce memory (CPU DRAM and accelerator HBM), network and disk storage and transfers. It can increase multi tenancy and accelerate compute. eXmY has been deployed in production for almost 2 years.
翻译:eXmY是一种用于机器学习模型量化的新型数据类型。它同时支持任意比特宽度以及任意整数和浮点数格式。例如,它能无缝支持3、5、6、7、9比特格式。对于特定比特宽度(例如7比特),它定义了所有可能的格式,如e0m6、e1m5、e2m4、e3m3、e4m2、e5m1和e6m0。针对非2的幂次比特宽度(如5、6、7比特),我们创建了一种新颖的编码与解码方案,该方案实现了完美压缩、字节可寻址性,并适用于分片与向量化处理。我们在C++、TensorFlow、JAX和PAX中实现了用于张量与检查点的仿真、编码及解码的库。为获得最优性能,编解码器在CPU上使用SIMD指令,在TPU和GPU上使用向量指令。eXmY同时也是一种技术,它利用了张量中指数的统计分布特性。该技术可用于量化权重、静态与动态激活值、梯度、主权重及优化器状态。它能有效减少内存(CPU DRAM与加速器HBM)、网络与磁盘存储及传输开销,提升多租户承载能力并加速计算。eXmY已在生产环境中部署运行近两年。