Blockchain consensus mechanisms based on Proof-of-Work consume significant energy, with Bitcoin alone estimated at approximately 150 TWh per year. Proof-of-Space reduces this cost by replacing repeated computation with storage, but plot generation remains bottlenecked by CPU hashing throughput. Prior work on VaultX demonstrated a high-performance CPU-based Proof-of-Space plotter using multi-threaded Blake3 hashing, achieving plotting speeds 4 to 50x faster than Chia depending on hardware configuration. In this paper, we present VaultxGPU, a GPU-accelerated extension of the VaultX plotter that offloads the Blake3 hashing pipeline to the GPU using custom kernels. We implement the plotter in both CUDA for NVIDIA hardware and SYCL for AMD and Intel GPUs, keeping Table 1 entirely in GPU VRAM and fusing the sort and match stages into a single kernel to minimize data movement. We evaluate VaultxGPU across K-values 27 through 31 against CPU baselines. Our SYCL GPU implementation achieves a 59.2x speedup over a single-threaded CPU baseline, completing a K=31 plot in 45.4 seconds compared to 2688 seconds, and outperforms even the best 384-thread CPU configuration. These results confirm that GPU acceleration is the correct direction for scaling Proof-of-Space plotting beyond what CPU parallelism can achieve.
翻译:基于工作量证明的区块链共识机制消耗大量能源,仅比特币一项每年耗能估计约150太瓦时。空间证明通过用存储替代重复计算来降低这一成本,但绘图生成仍受限于CPU的哈希吞吐量瓶颈。此前关于VaultX的研究展示了基于CPU的高性能空间证明绘图器,该绘图器采用多线程Blake3哈希算法,根据硬件配置不同,绘图速度比Chia快4至50倍。本文提出VaultxGPU——VaultX绘图器的GPU加速扩展版本,通过定制的内核将Blake3哈希流水线卸载至GPU执行。我们分别在NVIDIA硬件的CUDA平台和AMD/Intel GPU的SYCL平台上实现该绘图器,将表1完全驻留于GPU显存,并将排序与匹配阶段融合为单一内核以最小化数据迁移。我们针对K值27至31对VaultxGPU进行性能评估,并与CPU基线进行对比。我们的SYCL GPU实现相比单线程CPU基线取得59.2倍加速,完成K=31绘图仅需45.4秒(对比基线2688秒),甚至超越最佳384线程CPU配置的性能。这些结果证实,GPU加速是突破CPU并行度限制、实现空间证明绘图规模化的正确方向。