Point-based Differentiable Rendering (PBDR) enables high-fidelity 3D scene reconstruction, but scaling PBDR to high-resolution and large scenes requires efficient distributed training systems. Existing systems are tightly coupled to a specific PBDR method. And they suffer from severe communication overhead due to poor data locality. In this paper, we present Gaian, a general distributed training system for PBDR. Gaian provides a unified API expressive enough to support existing PBDR methods, while exposing rich data-access information, which Gaian leverages to optimize locality and reduce communication. We evaluated Gaian by implementing 4 PBDR algorithms. Our implementations achieve high performance and resource efficiency: across six datasets and up to 128 GPUs, it reduces communication by up to 91% and improves training throughput by 1.50x-3.71x.
翻译:基于点云的可微分渲染(PBDR)能够实现高保真度的三维场景重建,但将PBDR扩展至高分辨率和大规模场景需要高效的分布式训练系统。现有系统与特定的PBDR方法紧密耦合,且由于数据局部性差而存在严重的通信开销。本文提出Gaian,一个用于PBDR的通用分布式训练系统。Gaian提供了一个表达能力足够强的统一API,足以支持现有的PBDR方法,同时暴露丰富的数据访问信息,Gaian利用这些信息来优化局部性并减少通信。我们通过实现4种PBDR算法对Gaian进行了评估。我们的实现实现了高性能和资源效率:在六个数据集上,使用多达128个GPU时,通信量减少了高达91%,训练吞吐量提高了1.50倍至3.71倍。