Researchers have proposed to use data of human preference feedback to fine-tune text-to-image generative models. However, the scalability of human feedback collection has been limited by its reliance on manual annotation. Therefore, we develop and test a method to automatically annotate user preferences from their spontaneous facial expression reaction to the generated images. We collect a dataset of Facial Expression Reaction to Generated Images (FERGI) and show that the activations of multiple facial action units (AUs) are highly correlated with user evaluations of the generated images. Specifically, AU4 (brow lowerer) is most consistently reflective of negative evaluations of the generated image. This can be useful in two ways. Firstly, we can automatically annotate user preferences between image pairs with substantial difference in AU4 responses to them with an accuracy significantly outperforming state-of-the-art scoring models. Secondly, directly integrating the AU4 responses with the scoring models improves their consistency with human preferences. Additionally, the AU4 response best reflects the user's evaluation of the image fidelity, making it complementary to the state-of-the-art scoring models, which are generally better at reflecting image-text alignment. Finally, this method of automatic annotation with facial expression analysis can be potentially generalized to other generation tasks. The code is available at https://github.com/ShuangquanFeng/FERGI, and the dataset is also available at the same link for research purposes.
翻译:研究人员提出利用人类偏好反馈数据来微调文本到图像生成模型。然而,人类反馈的收集因其依赖人工标注而存在可扩展性限制。为此,我们开发并验证了一种通过用户对生成图像的自发面部表情反应自动标注其偏好的方法。我们构建了“生成图像面部表情反应数据集”(FERGI),结果表明多个面部动作单元(AUs)的激活与用户对生成图像的评价高度相关。其中,AU4(皱眉肌收缩)最能一致性地反映用户对生成图像的负面评价。该发现具有双重应用价值:首先,对于AU4反应差异显著的图像对,我们能够以显著优于现有评分模型的准确率自动标注用户偏好;其次,将AU4响应直接融入评分模型可提高其与人类偏好的一致性。此外,AU4响应最佳反映用户对图像保真度的评价,因此与当前更擅长评估图文对齐的先进评分模型形成互补。最后,这种基于面部表情分析的自动标注方法有望推广至其他生成任务。代码详见 https://github.com/ShuangquanFeng/FERGI,该数据集亦可通过同一链接获取用于研究目的。