As the complexity of 3D digital content grows exponentially, understanding human visual attention is critical for optimizing rendering and processing resources. Therefore, reliable 3D mesh saliency ground truth (GT) is essential for human-centric visual modeling in virtual reality (VR). However, existing VR eye-tracking frameworks are fundamentally bottlenecked by their underlying acquisition and generation mechanisms. The reliance on zero-area single ray sampling (SRS) fails to capture contextual features, leading to severe texture aliasing and discontinuous saliency signals. And the conventional application of Euclidean smoothing propagates saliency across disconnected physical gaps, resulting in semantic confusion on complex 3D manifolds. This paper proposes a robust framework to address these limitations. We first introduce a view cone sampling (VCS) strategy, which simulates the human foveal receptive field via Gaussian-distributed ray bundles to improve sampling robustness for complex topologies. Furthermore, a hybrid Manifold-Euclidean constrained diffusion (HCD) algorithm is developed, fusing manifold geodesic constraints with Euclidean scales to ensure topologically-consistent saliency propagation. We demonstrate the improvement in performance over baseline methods and the benefits for downstream tasks through subjective experiments and qualitative and quantitative methods. By mitigating "topological short-circuits" and aliasing, our framework provides a high-fidelity 3D attention acquisition paradigm that aligns with natural human perception, offering a more accurate and robust baseline for 3D mesh saliency research.
翻译:随着三维数字内容复杂度呈指数级增长,理解人类视觉注意力对于优化渲染与处理资源至关重要。因此,可靠的三维网格显著性真值(GT)对虚拟现实(VR)中以人为本的视觉建模具有重要意义。然而,现有VR眼动追踪框架在根本上受限于其底层采集与生成机制:基于零面积单射线采样(SRS)的方法无法捕获上下文特征,导致严重纹理混叠与不连续的显著性信号;而传统欧几里得平滑方法会跨越不连通的物理间隙传播显著性,在复杂三维流形上造成语义混淆。本文提出一种鲁棒框架以解决上述局限性。首先引入视锥采样(VCS)策略,通过高斯分布射线束模拟人类中央凹感受野,提升复杂拓扑结构的采样鲁棒性;进而开发混合流形-欧几里得约束扩散(HCD)算法,融合流形测地约束与欧几里得尺度,确保拓扑一致的显著性传播。通过主观实验与定性和定量方法,我们证明该方法相比基线模型的性能提升及其对下游任务的效益。通过抑制"拓扑短路"与混叠效应,本框架提供了一种符合自然人类感知的高保真三维注意力采集范式,为三维网格显著性研究建立更精确且鲁棒的基准。