Field-of-View (FoV) adaptive streaming significantly reduces bandwidth requirement of immersive point cloud video (PCV) by only transmitting visible points in a viewer's FoV. The traditional approaches often focus on trajectory-based 6 degree-of-freedom (6DoF) FoV predictions. The predicted FoV is then used to calculate point visibility. Such approaches do not explicitly consider video content's impact on viewer attention, and the conversion from FoV to point visibility is often error-prone and time-consuming. We reformulate the PCV FoV prediction problem from the cell visibility perspective, allowing for precise decision-making regarding the transmission of 3D data at the cell level based on the predicted visibility distribution. We develop a novel spatial visibility and object-aware graph model that leverages the historical 3D visibility data and incorporates spatial perception, neighboring cell correlation, and occlusion information to predict the cell visibility in the future. Our model significantly improves the long-term cell visibility prediction, reducing the prediction MSE loss by up to 50% compared to the state-of-the-art models while maintaining real-time performance (more than 30fps) for point cloud videos with over 1 million points.
翻译:视场(FoV)自适应流媒体通过仅传输观看者视场内的可见点,显著降低了沉浸式点云视频(PCV)的带宽需求。传统方法通常侧重于基于轨迹的六自由度(6DoF)视场预测,随后利用预测的视场计算点的可见性。这类方法未能明确考虑视频内容对观看者注意力的影响,且从视场到点可见性的转换过程往往容易出错且耗时。我们从单元可见性的角度重新构建了PCV视场预测问题,使得能够基于预测的可见性分布,在单元级别对三维数据的传输做出精确决策。我们开发了一种新颖的空间可见性与物体感知图模型,该模型利用历史三维可见性数据,并结合空间感知、相邻单元相关性以及遮挡信息,以预测未来的单元可见性。我们的模型显著提升了长期单元可见性预测的准确性,与现有最优模型相比,预测的均方误差(MSE)损失降低了高达50%,同时对于超过100万个点的点云视频,仍能保持实时性能(超过30帧/秒)。