Humor is a substantial element of human social behavior, affect, and cognition. Its automatic understanding can facilitate a more naturalistic human-AI interaction. Current methods of humor detection have been exclusively based on staged data, making them inadequate for "real-world" applications. We contribute to addressing this deficiency by introducing the novel Passau-Spontaneous Football Coach Humor (Passau-SFCH) dataset, comprising about 11 hours of recordings. The Passau-SFCH dataset is annotated for the presence of humor and its dimensions (sentiment and direction) as proposed in Martin's Humor Style Questionnaire. We conduct a series of experiments employing pretrained Transformers, convolutional neural networks, and expert-designed features. The performance of each modality (text, audio, video) for spontaneous humor recognition is analyzed and their complementarity is investigated. Our findings suggest that for the automatic analysis of humor and its sentiment, facial expressions are most promising, while humor direction can be best modeled via text-based features. Further, we experiment with different multimodal approaches to humor recognition, including decision-level fusion and MulT, a multimodal Transformer approach. In this context, we propose a novel multimodal architecture that yields the best overall results. Finally, we make our code publicly available at https://www.github.com/lc0197/passau-sfch. The Passau-SFCH dataset is available upon request.
翻译:幽默是人类社会行为、情感与认知的重要元素。其自动理解能够促进更自然的人机交互。现有的幽默检测方法完全基于预设场景数据,难以适用于"真实世界"的应用场景。为弥补这一不足,我们引入了新型帕绍-自发足球教练幽默数据集,该数据集包含约11小时的录制内容。帕绍-SFCH数据集根据马丁幽默风格问卷提出的框架,对幽默存在性及其维度进行了标注。我们开展了一系列实验,采用预训练Transformer、卷积神经网络及专家设计特征。分析了各模态在自发幽默识别中的性能表现,并探究了其互补性。研究结果表明:对于幽默及其情感倾向的自动分析,面部表情特征最具潜力;而幽默指向性则可通过文本特征实现最佳建模。此外,我们尝试了多种多模态幽默识别方法,包括决策级融合与多模态Transformer方法。在此背景下,我们提出了一种新型多模态架构,该架构取得了最佳综合性能。最后,我们在https://www.github.com/lc0197/passau-sfch公开了相关代码。帕绍-SFCH数据集可根据申请提供。