Gaze and head movements play a central role in expressive 3D media, human-agent interaction, and immersive communication. Existing works often model facial components in isolation and lack mechanisms for generating personalized, style-aware gaze behaviors. We propose StyGazeTalk, a multimodal framework that synthesizes synchronized gaze-head dynamics with controllable styles. To support high-fidelity training, we construct HAGE, a high-precision multimodal dataset containing eye-tracking data, audio, head pose, and 3D facial parameters. Experiments show that our method produces temporally coherent, style-consistent gaze-head motions, enhancing realism in 3D face generation.
翻译:视线与头部运动在富有表现力的三维媒体、人机交互以及沉浸式通信中扮演着核心角色。现有工作通常孤立地对面部各组件进行建模,且缺乏生成个性化、风格感知的视线行为的机制。我们提出了StyGazeTalk,一个能够合成具有可控风格的同步视线-头部动态的多模态框架。为支持高保真度训练,我们构建了HAGE,一个包含眼动追踪数据、音频、头部姿态以及三维面部参数的高精度多模态数据集。实验表明,我们的方法能够产生时间连贯、风格一致的视线-头部运动,从而提升了三维人脸生成的逼真度。