Efficient 3D Reconstruction, Streaming and Visualization of Static and Dynamic Scene Parts for Multi-client Live-telepresence in Large-scale Environments

2023 年 3 月 13 日

翻译：大规模环境中静态与动态场景部件的高效三维重建、流式传输与可视化：面向多客户端实时临场感系统

Leif Van Holland,Patrick Stotko,Stefan Krumpen,Reinhard Klein,Michael Weinmann

Despite the impressive progress of telepresence systems for room-scale scenes with static and dynamic scene entities, expanding their capabilities to scenarios with larger dynamic environments beyond a fixed size of a few square-meters remains challenging. In this paper, we aim at sharing 3D live-telepresence experiences in large-scale environments beyond room scale with both static and dynamic scene entities at practical bandwidth requirements only based on light-weight scene capture with a single moving consumer-grade RGB-D camera. To this end, we present a system which is built upon a novel hybrid volumetric scene representation in terms of the combination of a voxel-based scene representation for the static contents, that not only stores the reconstructed surface geometry but also contains information about the object semantics as well as their accumulated dynamic movement over time, and a point-cloud-based representation for dynamic scene parts, where the respective separation from static parts is achieved based on semantic and instance information extracted for the input frames. With an independent yet simultaneous streaming of both static and dynamic content, where we seamlessly integrate potentially moving but currently static scene entities in the static model until they are becoming dynamic again, as well as the fusion of static and dynamic data at the remote client, our system is able to achieve VR-based live-telepresence at close to real-time rates. Our evaluation demonstrates the potential of our novel approach in terms of visual quality, performance, and ablation studies regarding involved design choices.

翻译：尽管面向包含静态与动态场景实体的室内尺度场景的远程临场系统取得了显著进展，但将其能力扩展至超越固定数平方米规模的更大动态环境仍面临挑战。本文旨在仅基于单台消费级移动RGB-D摄像头的轻量级场景采集，在超出室内尺度的动态环境中实现包含静态与动态场景实体的三维实时临场体验，并满足实际带宽需求。为此，我们提出一种基于新型混合体素场景表示的系统：该表示结合了基于体素的静态内容场景表示（不仅存储重建的表面几何，还包含物体语义信息及其随时间累积的动态运动轨迹），以及基于点云的动态场景部件表示（通过输入帧中提取的语义与实例信息实现与静态部件的分离）。通过独立且同步地流式传输静态与动态内容（将可能移动但当前静止的场景实体无缝整合至静态模型中，直至其重新变为动态），并在远程客户端融合静动态数据，本系统能以近乎实时的速率实现基于虚拟现实的实时临场感体验。评估结果展示了本方法在视觉质量、性能及设计选择消融研究方面的潜力。