In this paper, we introduce X-Ray, an innovative approach to 3D generation that employs a new sequential representation, drawing inspiration from the depth-revealing capabilities of X-Ray scans to meticulously capture both the external and internal features of objects. Central to our method is the utilization of ray casting techniques originating from the camera's viewpoint, meticulously recording the geometric and textural details encountered across all intersected surfaces. This process efficiently condenses complete objects or scenes into a multi-frame format, just like videos. Such a structure ensures the 3D representation is composed solely of critical surface information. Highlighting the practicality and adaptability of our X-Ray representation, we showcase its utility in synthesizing 3D objects, employing a network architecture akin to that used in video diffusion models. The outcomes reveal our representation's superior performance in enhancing both the accuracy and efficiency of 3D synthesis, heralding new directions for ongoing research and practical implementations in the field.
翻译:在本文中,我们提出X-Ray,这是一种创新的3D生成方法,采用了一种借鉴X射线扫描深度揭示能力的新型顺序表示,以细致捕捉物体的外部和内部特征。我们的核心方法利用从相机视点出发的射线投射技术,精确记录所有相交表面上遇到的几何和纹理细节。这一过程将完整物体或场景高效浓缩为多帧格式,类似于视频。这种结构确保3D表示仅由关键表面信息组成。为突出X-Ray表示的实用性和适应性,我们展示了其在合成3D物体中的应用,采用了与视频扩散模型相似的网络架构。结果表明,我们的表示在提升3D合成准确性和效率方面表现出优越性能,为该领域的持续研究和实际应用开辟了新方向。