3D generation methods have shown visually compelling results powered by diffusion image priors. However, they often fail to produce realistic geometric details, resulting in overly smooth surfaces or geometric details inaccurately baked in albedo maps. To address this, we introduce a new method that incorporates touch as an additional modality to improve the geometric details of generated 3D assets. We design a lightweight 3D texture field to synthesize visual and tactile textures, guided by 2D diffusion model priors on both visual and tactile domains. We condition the visual texture generation on high-resolution tactile normals and guide the patch-based tactile texture refinement with a customized TextureDreambooth. We further present a multi-part generation pipeline that enables us to synthesize different textures across various regions. To our knowledge, we are the first to leverage high-resolution tactile sensing to enhance geometric details for 3D generation tasks. We evaluate our method in both text-to-3D and image-to-3D settings. Our experiments demonstrate that our method provides customized and realistic fine geometric textures while maintaining accurate alignment between two modalities of vision and touch.
翻译:三维生成方法在扩散图像先验的驱动下已展现出视觉上引人注目的结果。然而,这些方法通常难以生成逼真的几何细节,导致表面过于平滑或将几何细节不准确地烘焙到反照率贴图中。为解决此问题,我们提出一种新方法,将触觉作为附加模态引入,以改善生成三维资产的几何细节。我们设计了一个轻量级的三维纹理场,在视觉和触觉两个领域的二维扩散模型先验指导下,合成视觉与触觉纹理。我们将视觉纹理生成过程以高分辨率触觉法线图为条件,并利用定制的TextureDreambooth指导基于图像块的触觉纹理细化。我们进一步提出了一种多部件生成流程,使我们能够在不同区域合成各异的纹理。据我们所知,我们是首个利用高分辨率触觉感知来增强三维生成任务中几何细节的工作。我们在文本到三维和图像到三维两种设定下评估了我们的方法。实验表明,我们的方法能够提供定制化且逼真的精细几何纹理,同时保持视觉与触觉两种模态之间的精确对齐。