We present fVDB, a novel GPU-optimized framework for deep learning on large-scale 3D data. fVDB provides a complete set of differentiable primitives to build deep learning architectures for common tasks in 3D learning such as convolution, pooling, attention, ray-tracing, meshing, etc. fVDB simultaneously provides a much larger feature set (primitives and operators) than established frameworks with no loss in efficiency: our operators match or exceed the performance of other frameworks with narrower scope. Furthermore, fVDB can process datasets with much larger footprint and spatial resolution than prior works, while providing a competitive memory footprint on small inputs. To achieve this combination of versatility and performance, fVDB relies on a single novel VDB index grid acceleration structure paired with several key innovations including GPU accelerated sparse grid construction, convolution using tensorcores, fast ray tracing kernels using a Hierarchical Digital Differential Analyzer algorithm (HDDA), and jagged tensors. Our framework is fully integrated with PyTorch enabling interoperability with existing pipelines, and we demonstrate its effectiveness on a number of representative tasks such as large-scale point-cloud segmentation, high resolution 3D generative modeling, unbounded scale Neural Radiance Fields, and large-scale point cloud reconstruction.
翻译:本文提出fVDB,一种面向大规模三维数据深度学习的新型GPU优化框架。fVDB提供了一套完整的可微分基元,用于构建针对三维学习中常见任务(如卷积、池化、注意力机制、光线追踪、网格生成等)的深度学习架构。在保持效率不损失的前提下,fVDB同时提供了比现有框架更丰富的功能集(基元与算子):我们的算子在性能上达到或超越了功能范围更窄的其他框架。此外,fVDB能够处理比先前工作具有更大数据规模与空间分辨率的数据集,同时在小规模输入上保持具有竞争力的内存占用。为实现这种通用性与性能的结合,fVDB依托于一种新颖的单一VDB索引网格加速结构,并结合了多项关键创新技术,包括GPU加速的稀疏网格构建、基于张量核心的卷积运算、采用分层数字微分分析器算法(HDDA)的快速光线追踪内核,以及锯齿张量。本框架与PyTorch完全集成,可实现与现有流程的互操作性;我们通过一系列代表性任务验证了其有效性,包括大规模点云分割、高分辨率三维生成建模、无边界尺度神经辐射场以及大规模点云重建。