People can innately recognize human facial expressions in unnatural forms, such as when depicted on the unusual faces drawn in cartoons or when applied to an animal's features. However, current machine learning algorithms struggle with out-of-domain transfer in facial expression recognition (FER). We propose a biologically-inspired mechanism for such transfer learning, which is based on norm-referenced encoding, where patterns are encoded in terms of difference vectors relative to a domain-specific reference vector. By incorporating domain-specific reference frames, we demonstrate high data efficiency in transfer learning across multiple domains. Our proposed architecture provides an explanation for how the human brain might innately recognize facial expressions on varying head shapes (humans, monkeys, and cartoon avatars) without extensive training. Norm-referenced encoding also allows the intensity of the expression to be read out directly from neural unit activity, similar to face-selective neurons in the brain. Our model achieves a classification accuracy of 92.15\% on the FERG dataset with extreme data efficiency. We train our proposed mechanism with only 12 images, including a single image of each class (facial expression) and one image per domain (avatar). In comparison, the authors of the FERG dataset achieved a classification accuracy of 89.02\% with their FaceExpr model, which was trained on 43,000 images.
翻译:人类能够本能地识别非自然形式的人脸表情,例如卡通画中描绘的异形面孔或应用于动物特征的表情。然而,当前机器学习算法在人脸表情识别(FER)中的跨域迁移仍面临挑战。我们提出了一种受生物学启发的迁移学习机制,该机制基于规范参考编码——将模式编码为相对于特定域参考向量的差异向量。通过整合域特定的参考框架,我们展示了在多域迁移学习中极高的数据效率。所提出的架构解释了人类大脑如何无需大量训练即可本能识别不同头部形状(人类、猴子、卡通头像)上的面部表情。规范参考编码还允许从神经单元活动中直接读取表情强度,类似于大脑中具有面部选择性的神经元。我们的模型在FERG数据集上以极端的数据效率实现了92.15%的分类准确率。我们仅用12张图像训练所提机制,包括每类(面部表情)单张图像和每域(头像)单张图像。相比之下,FERG数据集的作者使用其FaceExpr模型在43000张图像上训练后达到了89.02%的分类准确率。