Multi-view projection techniques have shown themselves to be highly effective in achieving top-performing results in the recognition of 3D shapes. These methods involve learning how to combine information from multiple view-points. However, the camera view-points from which these views are obtained are often fixed for all shapes. To overcome the static nature of current multi-view techniques, we propose learning these view-points. Specifically, we introduce the Multi-View Transformation Network (MVTN), which uses differentiable rendering to determine optimal view-points for 3D shape recognition. As a result, MVTN can be trained end-to-end with any multi-view network for 3D shape classification. We integrate MVTN into a novel adaptive multi-view pipeline that is capable of rendering both 3D meshes and point clouds. Our approach demonstrates state-of-the-art performance in 3D classification and shape retrieval on several benchmarks (ModelNet40, ScanObjectNN, ShapeNet Core55). Further analysis indicates that our approach exhibits improved robustness to occlusion compared to other methods. We also investigate additional aspects of MVTN, such as 2D pretraining and its use for segmentation. To support further research in this area, we have released MVTorch, a PyTorch library for 3D understanding and generation using multi-view projections.
翻译:多视角投影技术在三维形状识别中已被证明能够实现顶尖性能。这些方法涉及学习如何融合来自多个视角的信息。然而,获取这些视角的相机视点通常对所有形状都是固定的。为了克服当前多视角技术的静态特性,我们提出学习这些视点。具体而言,我们引入了多视角变换网络(MVTN),该网络利用可微分渲染来确定三维形状识别的最佳视点。因此,MVTN可以与任何用于三维形状分类的多视角网络进行端到端训练。我们将MVTN集成到一个新颖的自适应多视角流程中,该流程能够渲染三维网格和点云。我们的方法在多个基准数据集(ModelNet40、ScanObjectNN、ShapeNet Core55)上展示了最先进的三维分类和形状检索性能。进一步分析表明,与其他方法相比,我们的方法对遮挡表现出更强的鲁棒性。我们还研究了MVTN的其他方面,例如二维预训练及其在分割任务中的应用。为了支持该领域的进一步研究,我们发布了MVTorch,这是一个基于PyTorch的、使用多视角投影进行三维理解与生成的库。