High-fidelity gaze redirection is critical for generating augmented data to improve the generalization of gaze estimators. 3D Gaussian Splatting (3DGS) models like GazeGaussian represent the state-of-the-art but can struggle with rendering subtle, continuous gaze shifts. In this paper, we propose DiT-Gaze, a framework that enhances 3D gaze redirection models using a novel combination of Diffusion Transformer (DiT), weak supervision across gaze angles, and an orthogonality constraint loss. DiT allows higher-fidelity image synthesis, while our weak supervision strategy using synthetically generated intermediate gaze angles provides a smooth manifold of gaze directions during training. The orthogonality constraint loss mathematically enforces the disentanglement of internal representations for gaze, head pose, and expression. Comprehensive experiments show that DiT-Gaze sets a new state-of-the-art in both perceptual quality and redirection accuracy, reducing the state-of-the-art gaze error by 4.1% to 6.353 degrees, providing a superior method for creating synthetic training data. Our code and models will be made available for the research community to benchmark against.
翻译:高保真视线重定向对于生成增强数据以提升视线估计器的泛化能力至关重要。诸如GazeGaussian等3D高斯溅射(3DGS)模型代表了当前最优技术,但在渲染细微、连续的视线偏移时仍面临挑战。本文提出DiT-Gaze框架,该框架通过创新性地结合扩散变换器(DiT)、跨视线角度的弱监督以及正交约束损失,增强了3D视线重定向模型的性能。DiT实现了更高保真度的图像合成,而采用合成生成中间视线角度的弱监督策略则在训练过程中提供了平滑的视线方向流形。正交约束损失从数学上强制解耦了视线、头部姿态与表情的内部表征。综合实验表明,DiT-Gaze在感知质量与重定向精度上均创造了新的最优水平,将当前最佳视线误差降低了4.1%至6.353度,为创建合成训练数据提供了更优方法。我们的代码与模型将向研究社区公开,以支持基准测试。