Creating high-fidelity, real-time drivable 3D head avatars is a core challenge in digital animation. While 3D Gaussian Splashing (3D-GS) offers unprecedented rendering speed and quality, current animation techniques often rely on a "one-size-fits-all" global tuning approach, where all Gaussian primitives are uniformly driven by a single expression code. This simplistic approach fails to unravel the distinct dynamics of different facial regions, such as deformable skin versus rigid teeth, leading to significant blurring and distortion artifacts. We introduce Conditionally-Adaptive Gaussian Avatars (CAG-Avatar), a framework that resolves this key limitation. At its core is a Conditionally Adaptive Fusion Module built on cross-attention. This mechanism empowers each 3D Gaussian to act as a query, adaptively extracting relevant driving signals from the global expression code based on its canonical position. This "tailor-made" conditioning strategy drastically enhances the modeling of fine-grained, localized dynamics. Our experiments confirm a significant improvement in reconstruction fidelity, particularly for challenging regions such as teeth, while preserving real-time rendering performance.
翻译:创建高保真、实时可驱动的3D头部化身是数字动画领域的核心挑战。尽管3D高斯溅射(3D-GS)提供了前所未有的渲染速度与质量,但现有的动画技术通常依赖于一种“一刀切”的全局调节方法,即所有高斯基元均由单一表情编码统一驱动。这种简单化的方法无法解析不同面部区域(如可变形皮肤与刚性牙齿)的独特动态特性,导致严重的模糊和失真伪影。我们提出了条件自适应高斯化身(CAG-Avatar)框架,以解决这一关键局限。其核心是一个基于交叉注意力构建的条件自适应融合模块。该机制使每个3D高斯能够作为查询,根据其规范位置从全局表情编码中自适应地提取相关驱动信号。这种“量身定制”的条件调节策略显著增强了对细粒度局部动态的建模能力。实验证实,我们的方法在重建保真度方面取得了显著提升,尤其对于牙齿等挑战性区域,同时保持了实时渲染性能。