Researchers have proposed to use data of human preference feedback to fine-tune text-to-image generative models. However, the scalability of human feedback collection has been limited by its reliance on manual annotation. Therefore, we develop and test a method to automatically annotate user preferences from their spontaneous facial expression reaction to the generated images. We collect a dataset of Facial Expression Reaction to Generated Images (FERGI) and show that the activations of multiple facial action units (AUs) are highly correlated with user evaluations of the generated images. Specifically, AU4 (brow lowerer) is reflective of negative evaluations of the generated image whereas AU12 (lip corner puller) is reflective of positive evaluations. These can be useful in two ways. Firstly, we can automatically annotate user preferences between image pairs with substantial difference in these AU responses with an accuracy significantly outperforming state-of-the-art scoring models. Secondly, directly integrating the AU responses with the scoring models improves their consistency with human preferences. Finally, this method of automatic annotation with facial expression analysis can be potentially generalized to other generation tasks. The code is available at https://github.com/ShuangquanFeng/FERGI, and the dataset is also available at the same link for research purposes.
翻译:研究者已提出利用人类偏好反馈数据微调文本到图像生成模型。然而,人工标注的依赖限制了人类反馈收集的可扩展性。为此,我们开发并测试了一种方法,通过用户对生成图像的自发面部表情反应自动标注其偏好。我们构建了面部表情反应-生成图像数据集(FERGI),并证明多项面部动作单元(AU)的激活与用户对生成图像的评价高度相关。具体而言,AU4(降眉肌)反映对生成图像的负面评价,而AU12(提嘴角肌)则反映正面评价。这些发现可通过两种方式应用:其一,我们可在具有显著AU反应差异的图像对之间自动标注用户偏好,其准确率显著优于现有最优评分模型;其二,将AU反应直接融入评分模型可提升其与人类偏好的一致性。最终,这种基于面部表情分析的自动标注方法有望推广至其他生成任务。代码已开源至https://github.com/ShuangquanFeng/FERGI,数据集亦可在同一链接中供研究使用。