In this paper, we work to bring telepresence to every desktop. Unlike commercial systems, personal 3D video conferencing systems must render high-quality videos while remaining financially and computationally viable for the average consumer. To this end, we introduce a capturing and rendering system that only requires 4 consumer-grade RGBD cameras and synthesizes high-quality free-viewpoint videos of users as well as their environments. Experimental results show that our system renders high-quality free-viewpoint videos without using object templates or heavy pre-processing. While not real-time, our system is fast and does not require per-video optimizations. Moreover, our system is robust to complex hand gestures and clothing, and it can generalize to new users. This work provides a strong basis for further optimization, and it will help bring telepresence to every desk in the near future. The code and dataset will be made available on our website https://mcmvmc.github.io/PersonalTelepresence/.
翻译:本文致力于将远程临场感技术普及至每个桌面。与商业系统不同,个人3D视频会议系统必须在渲染高质量视频的同时,保持普通消费者在财务和计算上的可行性。为此,我们提出一种仅需四台消费级RGBD相机的捕捉与渲染系统,能够合成用户及其环境的高质量自由视角视频。实验结果表明,该系统无需使用对象模板或大量预处理即可渲染高质量自由视角视频。尽管未达到实时性能,但系统运行快速且无需针对每段视频进行优化。此外,该系统对复杂手势与衣物变化具有鲁棒性,并能泛化至新用户。本研究为后续优化提供了坚实基础,并将助力近期内将远程临场感技术普及至每张办公桌。代码与数据集将发布于我们的网站 https://mcmvmc.github.io/PersonalTelepresence/。