According to the Stimulus Organism Response (SOR) theory, all human behavioral reactions are stimulated by context, where people will process the received stimulus and produce an appropriate reaction. This implies that in a specific context for a given input stimulus, a person can react differently according to their internal state and other contextual factors. Analogously, in dyadic interactions, humans communicate using verbal and nonverbal cues, where a broad spectrum of listeners' non-verbal reactions might be appropriate for responding to a specific speaker behaviour. There already exists a body of work that investigated the problem of automatically generating an appropriate reaction for a given input. However, none attempted to automatically generate multiple appropriate reactions in the context of dyadic interactions and evaluate the appropriateness of those reactions using objective measures. This paper starts by defining the facial Multiple Appropriate Reaction Generation (fMARG) task for the first time in the literature and proposes a new set of objective evaluation metrics to evaluate the appropriateness of the generated reactions. The paper subsequently introduces a framework to predict, generate, and evaluate multiple appropriate facial reactions.
翻译:根据刺激-有机体-反应理论,所有人类行为反应均受情境刺激驱动,个体将处理接收到的刺激并产生相应反应。这表明在特定情境中,针对给定的输入刺激,个体会根据其内部状态及其他情境因素产生差异化反应。类似地,在双人交互过程中,人类通过言语与非言语线索进行沟通,听众可能产生广泛多样的非言语反应来回应说话者的特定行为。已有研究探讨了自动生成针对给定输入的适当反应问题,但尚未有研究尝试在双人交互场景中自动生成多个适当反应,并通过客观指标评估其适当性。本文首次在文献中定义了面部多适当反应生成任务,并提出一套新的客观评估指标体系来评价生成反应的适当性,随后介绍了一个用于预测、生成及评估多个适当面部反应的框架。