Closing the Gap Between Float and Posit Hardware Efficiency

The b-posit, or bounded posit, is a variation of the posit format designed for high performance computing (HPC) and AI applications. Unlike traditional floating-point formats (floats), posits use variable-length fields for exponent scaling and significand, providing better efficiency for the same bit width. However, this flexibility introduces high worst-case overhead in decode-encode logic, exceeding the cost of handling subnormals for floats. To address this, the b-posit restricts the regime field to a 6-bit limit, reducing variability in regime and fraction sizes. With an exponent size eS of 5 bits, the dynamic range is $2^{-192}$ to $2^{192}$ (about $10^{-58}$ to $10^{58}$) and the quire size is 800 bits, for any precision $n>12$. This constraint improves numerical properties and simplifies hardware implementation by allowing decode-encode operations with basic multiplexers. Our 32-bit b-posit decoder circuits achieve significant improvements: 79 percent less power consumption, 71 percent smaller area, and 60 percent reduced latency compared to standard posit decoders. The 32-bit b-posit encoder shows 68 percent lower power usage, 46 percent less area, and 44 percent shorter delay. The proposed b-posit hardware exhibits superior scalability with increasing bit widths, outperforming standard posit hardware at higher precisions, with even greater advantages at 64-bit. Notably, the b-posit decode-encode hardware matches or exceeds IEEE compliant 32-bit floating-point performance, offering faster and smaller area implementation, with slight increase in worst-case power due to higher speed. The b-posit hardware design provides the clean mathematical behavior and higher accuracy of posits versus IEEE floats without the power, area, or latency costs observed for the Posit Standard (2022). We believe the b-posit should influence future standard revisions.

翻译：b-posit（有界posit）是专为高性能计算（HPC）和人工智能应用设计的posit格式变体。与传统浮点格式（float）不同，posit采用可变长度字段进行指数缩放和有效数表示，从而在相同位宽下提供更优效率。然而，这种灵活性导致编解码逻辑在最坏情况下产生较高开销，甚至超过浮点数处理次正规数的成本。为解决此问题，b-posit将regime字段限制为6位，减少了regime与分数部分大小的可变性。当指数大小eS为5位时，其动态范围达到$2^{-192}$至$2^{192}$（约$10^{-58}$至$10^{58}$），quire大小为800位，适用于任意精度$n>12$。这种约束通过允许使用基本多路复用器进行编解码操作，既改善了数值特性，又简化了硬件实现。我们的32位b-posit解码器电路实现了显著改进：与标准posit解码器相比，功耗降低79%，面积减小71%，延迟减少60%。32位b-posit编码器则显示功耗降低68%，面积减少46%，延迟缩短44%。所提出的b-posit硬件在增加位宽时展现出卓越的可扩展性，在更高精度下超越标准posit硬件，64位精度时优势更为明显。值得注意的是，b-posit编解码硬件达到或超越了符合IEEE标准的32位浮点性能，在实现更快速率和更小面积的同时，仅因速度提升导致最坏情况功耗略有增加。b-posit硬件设计在保持posit相较于IEEE浮点数更清晰的数学行为和更高精度的同时，避免了Posit标准（2022）中存在的功耗、面积或延迟代价。我们相信b-posit将影响未来标准的修订。