The volumetric representation of human interactions is one of the fundamental domains in the development of immersive media productions and telecommunication applications. Particularly in the context of the rapid advancement of Extended Reality (XR) applications, this volumetric data has proven to be an essential technology for future XR elaboration. In this work, we present a new multimodal database to help advance the development of immersive technologies. Our proposed database provides ethically compliant and diverse volumetric data, in particular 27 participants displaying posed facial expressions and subtle body movements while speaking, plus 11 participants wearing head-mounted displays (HMDs). The recording system consists of a volumetric capture (VoCap) studio, including 31 synchronized modules with 62 RGB cameras and 31 depth cameras. In addition to textured meshes, point clouds, and multi-view RGB-D data, we use one Lytro Illum camera for providing light field (LF) data simultaneously. Finally, we also provide an evaluation of our dataset employment with regard to the tasks of facial expression classification, HMDs removal, and point cloud reconstruction. The dataset can be helpful in the evaluation and performance testing of various XR algorithms, including but not limited to facial expression recognition and reconstruction, facial reenactment, and volumetric video. HEADSET and its all associated raw data and license agreement will be publicly available for research purposes.
翻译:人类交互的体积表示是沉浸式媒体制作和电信应用发展的基础领域之一。特别是在扩展现实(XR)应用快速发展的背景下,此类体积数据已被证明是未来XR技术发展的重要基石。本文提出了一种新型多模态数据库,以推动沉浸式技术的进步。该数据库提供符合伦理规范且多样化的体积数据,具体包括27位参与者在说话时呈现的面部表情与微妙身体动作,以及11位佩戴头戴式显示器(HMDs)的参与者。采集系统由体积捕捉(VoCap)工作室构成,包含31个同步模块,配备62个RGB摄像头和31个深度摄像头。除纹理网格、点云和多视角RGB-D数据外,我们还使用一台Lytro Illum相机同步采集光场(LF)数据。最后,我们针对面部表情分类、HMDs移除和点云重建任务,对数据集的应用效果进行了评估。该数据集有助于各类XR算法的评估与性能测试,包括但不限于面部表情识别与重建、面部再现及体积视频。HEADSET及其所有关联原始数据和许可协议将公开提供用于研究目的。