Immersive multimedia applications, such as Virtual, Augmented and Mixed Reality, have become more practical with advances in hardware and software for acquiring and rendering 3D media as well as 5G/6G wireless networks. Such applications require the delivery of volumetric video to users with six degrees of freedom (6-DoF) movements. Point Cloud has become a popular volumetric video format due to its flexibility and simplicity. A dense point cloud consumes much higher bandwidth than a 2D/360 degree video frame. User Field of View (FoV) is more dynamic with 6-DoF movement than 3-DoF movement. A user's view quality of a 3D object is affected by points occlusion and distance, which are constantly changing with user and object movements. To save bandwidth, FoV-adaptive streaming predicts user FoV and only downloads the data falling in the predicted FoV, but it is vulnerable to FoV prediction errors, which is significant when a long buffer is used for smoothed streaming. In this work, we propose a multi-round progressive refinement framework for point cloud-based volumetric video streaming. Instead of sequentially downloading frames, we simultaneously downloads/patches multiple frames falling into a sliding time-window, leveraging on the scalability of point-cloud coding. The rate allocation among all tiles of active frames are solved analytically using the heterogeneous tile utility functions calibrated by the predicted user FoV. Multi-frame patching takes advantage of the streaming smoothness resulted from long buffer and the FoV prediction accuracy at short buffer length. We evaluate our solution using simulations driven by real point cloud videos, bandwidth traces and 6-DoF FoV traces of real users. The experiments show that our solution is robust against bandwidth/FoV prediction errors, and can deliver high and smooth quality in the face of bandwidth variations and dynamic user movements.
翻译:沉浸式多媒体应用(如虚拟现实、增强现实和混合现实)随着三维媒体采集与渲染硬件软件及5G/6G无线网络的进步而日益实用化。此类应用需要向具有六自由度(6-DoF)移动能力的用户传输体积视频。点云因其灵活性和简洁性成为流行的体积视频格式。高密度点云所需的带宽远超二维/360度视频帧。与三自由度移动相比,六自由度移动下用户视场角(FoV)的动态性更强。用户对三维物体的视觉质量受点云遮挡与距离的影响,而这些因素随用户和物体的移动持续变化。为节省带宽,FoV自适应流媒体通过预测用户FoV仅下载落入预测范围内数据,但易受FoV预测误差影响——当采用长缓冲实现平滑流传输时该误差尤为显著。本研究提出基于点云体积视频流的多轮渐进式细化框架。该框架不采用顺序下载帧的方式,而是利用点云编码的可扩展性,在滑动时间窗口内同时下载/修补多个帧。通过结合预测用户FoV校准的异构分块效用函数,以解析方式求解活动帧所有分块的码率分配。多帧修补兼顾了长缓冲区带来的流传输平滑性与短缓冲区下的FoV预测精度。我们采用真实点云视频、带宽轨迹和真实用户六自由度FoV轨迹驱动的仿真评估方案。实验表明,本方案对带宽/FoV预测误差具有鲁棒性,能在带宽波动与用户动态移动场景下提供高质量平滑体验。