ReDirTrans: Latent-to-Latent Translation for Gaze and Head Redirection

Learning-based gaze estimation methods require large amounts of training data with accurate gaze annotations. Facing such demanding requirements of gaze data collection and annotation, several image synthesis methods were proposed, which successfully redirected gaze directions precisely given the assigned conditions. However, these methods focused on changing gaze directions of the images that only include eyes or restricted ranges of faces with low resolution (less than $128\times128$) to largely reduce interference from other attributes such as hairs, which limits application scenarios. To cope with this limitation, we proposed a portable network, called ReDirTrans, achieving latent-to-latent translation for redirecting gaze directions and head orientations in an interpretable manner. ReDirTrans projects input latent vectors into aimed-attribute embeddings only and redirects these embeddings with assigned pitch and yaw values. Then both the initial and edited embeddings are projected back (deprojected) to the initial latent space as residuals to modify the input latent vectors by subtraction and addition, representing old status removal and new status addition. The projection of aimed attributes only and subtraction-addition operations for status replacement essentially mitigate impacts on other attributes and the distribution of latent vectors. Thus, by combining ReDirTrans with a pretrained fixed e4e-StyleGAN pair, we created ReDirTrans-GAN, which enables accurately redirecting gaze in full-face images with $1024\times1024$ resolution while preserving other attributes such as identity, expression, and hairstyle. Furthermore, we presented improvements for the downstream learning-based gaze estimation task, using redirected samples as dataset augmentation.

翻译：基于学习的视线估计方法需要大量带有精确视线标注的训练数据。为应对视线数据收集与标注的严苛需求，研究者提出了多种图像合成方法，这些方法能够根据给定条件精准重定向视线方向。然而，现有方法仅关注包含眼睛或面部受限区域、分辨率较低（小于$128\times128$）图像的视线方向改变，以大幅减少头发等其他属性的干扰，这限制了应用场景。为突破这一限制，我们提出了一种名为ReDirTrans的轻量化网络，以可解释的方式实现视线方向与头部姿态的潜空间到潜空间翻译。ReDirTrans将输入潜向量仅投影至目标属性嵌入空间，并根据指定的俯仰角和偏航角值对这些嵌入进行重定向。随后，初始嵌入与编辑后的嵌入被反向投影（去投影）至初始潜空间作为残差，通过减法和加法操作修改输入潜向量，分别代表移除旧状态与添加新状态。这种仅对目标属性的投影以及状态替换中的加减操作，从根本上减少了对其他属性和潜向量分布的影响。因此，通过将ReDirTrans与预训练的固定e4e-StyleGAN对结合，我们构建了ReDirTrans-GAN，能够对$1024\times1024$分辨率的全脸图像中的视线进行精准重定向，同时保留身份、表情、发型等其他属性。此外，我们还展示了将重定向样本作为数据集增强手段，为下游基于学习的视线估计任务带来的性能提升。