Neural radiance fields (NeRFs) show potential for transforming images captured worldwide into immersive 3D visual experiences. However, most of this captured visual data remains siloed in our camera rolls as these images contain personal details. Even if made public, the problem of learning 3D representations of billions of scenes captured daily in a centralized manner is computationally intractable. Our approach, DecentNeRF, is the first attempt at decentralized, crowd-sourced NeRFs that require $\sim 10^4\times$ less server computing for a scene than a centralized approach. Instead of sending the raw data, our approach requires users to send a 3D representation, distributing the high computation cost of training centralized NeRFs between the users. It learns photorealistic scene representations by decomposing users' 3D views into personal and global NeRFs and a novel optimally weighted aggregation of only the latter. We validate the advantage of our approach to learn NeRFs with photorealism and minimal server computation cost on structured synthetic and real-world photo tourism datasets. We further analyze how secure aggregation of global NeRFs in DecentNeRF minimizes the undesired reconstruction of personal content by the server.
翻译:神经辐射场(NeRF)展现了将全球拍摄图像转化为沉浸式三维视觉体验的潜力。然而,由于这些图像包含个人细节,大多数捕获的视觉数据仍被隔离在用户的相机胶卷中。即使公开数据,以集中方式学习每天捕获的数十亿场景的三维表示在计算上也是难以实现的。我们提出的DecentNeRF方法是对分布式众包NeRF的首次尝试,其对单个场景所需的服务器计算量比集中式方法减少约10^4倍。该方法不传输原始数据,而是要求用户发送三维表示,从而将训练集中式NeRF的高计算成本分配给用户。通过将用户的三维视角分解为个人NeRF和全局NeRF,并仅对后者进行最优加权聚合,我们的方法能够学习逼真的场景表示。我们在结构化合成数据集和真实世界照片旅游数据集上验证了该方法以最小化服务器计算成本实现NeRF照片级真实感的优势,并进一步分析了DecentNeRF中全局NeRF的安全聚合如何最大程度减少服务器对个人内容的不期望重建。