LiveVV: Human-Centered Live Volumetric Video Streaming System

Volumetric video has emerged as a prominent medium within the realm of eXtended Reality (XR) with the advancements in computer graphics and depth capture hardware. Users can fully immersive themselves in volumetric video with the ability to switch their viewport in six degree-of-freedom (DOF), including three rotational dimensions (yaw, pitch, roll) and three translational dimensions (X, Y, Z). Different from traditional 2D videos that are composed of pixel matrices, volumetric videos employ point clouds, meshes, or voxels to represent a volumetric scene, resulting in significantly larger data sizes. While previous works have successfully achieved volumetric video streaming in video-on-demand scenarios, the live streaming of volumetric video remains an unresolved challenge due to the limited network bandwidth and stringent latency constraints. In this paper, we for the first time propose a holistic live volumetric video streaming system, LiveVV, which achieves multi-view capture, scene segmentation \& reuse, adaptive transmission, and rendering. LiveVV contains multiple lightweight volumetric video capture modules that are capable of being deployed without prior preparation. To reduce bandwidth consumption, LiveVV processes static and dynamic volumetric content separately by reusing static data with low disparity and decimating data with low visual saliency. Besides, to deal with network fluctuation, LiveVV integrates a volumetric video adaptive bitrate streaming algorithm (VABR) to enable fluent playback with the maximum quality of experience. Extensive real-world experiment shows that LiveVV can achieve live volumetric video streaming at a frame rate of 24 fps with a latency of less than 350ms.

翻译：体积视频随着计算机图形学和深度捕捉硬件的进步，已成为扩展现实（XR）领域的重要媒介。用户可在六自由度（DOF）视角切换中完全沉浸于体积视频，包括三个旋转维度（偏航、俯仰、翻滚）和三个平移维度（X、Y、Z）。与由像素矩阵构成的传统二维视频不同，体积视频采用点云、网格或体素表征体积场景，导致数据量显著增大。尽管已有研究成功实现了点播场景下的体积视频流媒体传输，但由于网络带宽有限和严格的延迟约束，体积视频的实时流媒体仍然是一个未解决的挑战。本文首次提出一个全面的实时体积视频流媒体系统LiveVV，实现了多视角捕捉、场景分割与复用、自适应传输及渲染。LiveVV包含多个轻量级体积视频捕捉模块，无需预先部署即可使用。为降低带宽消耗，LiveVV通过复用低差异静态数据并剔除低视觉显著性的数据，对静态与动态体积内容进行分离处理。此外，为应对网络波动，LiveVV集成了体积视频自适应比特率流媒体算法（VABR），以最大体验质量实现流畅播放。大量真实环境实验表明，LiveVV在延迟低于350ms的条件下可实现24fps帧率的实时体积视频流媒体传输。