We present MS2Mesh-XR, a novel multi-modal sketch-to-mesh generation pipeline that enables users to create realistic 3D objects in extended reality (XR) environments using hand-drawn sketches assisted by voice inputs. In specific, users can intuitively sketch objects using natural hand movements in mid-air within a virtual environment. By integrating voice inputs, we devise ControlNet to infer realistic images based on the drawn sketches and interpreted text prompts. Users can then review and select their preferred image, which is subsequently reconstructed into a detailed 3D mesh using the Convolutional Reconstruction Model. In particular, our proposed pipeline can generate a high-quality 3D mesh in less than 20 seconds, allowing for immersive visualization and manipulation in run-time XR scenes. We demonstrate the practicability of our pipeline through two use cases in XR settings. By leveraging natural user inputs and cutting-edge generative AI capabilities, our approach can significantly facilitate XR-based creative production and enhance user experiences. Our code and demo will be available at: https://yueqiu0911.github.io/MS2Mesh-XR/
翻译:本文提出MS2Mesh-XR,一种新颖的多模态草图到网格生成流程,允许用户在扩展现实(XR)环境中,借助语音输入辅助的手绘草图创建逼真的三维物体。具体而言,用户可在虚拟环境中通过自然的空中手部运动直观地勾勒物体轮廓。通过集成语音输入,我们设计了ControlNet,以基于绘制的草图和解译的文本提示推断出逼真图像。用户随后可审阅并选择其偏好的图像,该图像将通过卷积重建模型进一步重构为精细的三维网格。特别地,我们提出的流程可在20秒内生成高质量三维网格,支持在实时XR场景中进行沉浸式可视化与操作。我们通过两个XR场景下的应用案例展示了该流程的实用性。通过利用自然的用户输入与前沿的生成式AI能力,我们的方法能显著促进基于XR的创意生产并提升用户体验。我们的代码与演示将在以下网址提供:https://yueqiu0911.github.io/MS2Mesh-XR/