Multimodal reasoning, an area of artificial intelligence that aims at make inferences from multimodal signals such as vision, language and speech, has drawn more and more attention in recent years. People with different personalities may respond differently to the same situation. However, such individual personalities were ignored in the previous studies. In this work, we introduce a new Personality-aware Human-centric Multimodal Reasoning (Personality-aware HMR) task, and accordingly construct a new dataset based on The Big Bang Theory television shows, to predict the behavior of a specific person at a specific moment, given the multimodal information of its past and future moments. The Myers-Briggs Type Indicator (MBTI) was annotated and utilized in the task to represent individuals' personalities. We benchmark the task by proposing three baseline methods, two were adapted from the related tasks and one was newly proposed for our task. The experimental results demonstrate that personality can effectively improve the performance of human-centric multimodal reasoning. To further solve the lack of personality annotation in real-life scenes, we introduce an extended task called Personality-predicted HMR, and propose the corresponding methods, to predict the MBTI personality at first, and then use the predicted personality to help multimodal reasoning. The experimental results show that our method can accurately predict personality and achieves satisfactory multimodal reasoning performance without relying on personality annotations.
翻译:多模态推理作为人工智能领域旨在从视觉、语言和语音等多模态信号中进行推断的研究方向,近年来受到越来越多关注。不同个性的人对相同情境可能做出不同反应,然而以往研究忽略了这种个体个性差异。本文提出了一项新的"个性感知的以人为本多模态推理"任务,并基于《生活大爆炸》电视剧构建了相应数据集,通过给定某人过去和未来时刻的多模态信息,预测其在特定时刻的行为。任务中使用迈尔斯-布里格斯类型指标标注并表征个体个性。我们通过提出三种基线方法对该任务进行基准测试,其中两种改编自相关任务,另一种针对本任务新提出。实验结果表明,个性能够有效提升以人为本多模态推理的性能。为解决现实场景中缺乏个性标注的问题,我们进一步引入名为"个性预测的多模态推理"的扩展任务,并提出相应方法,先预测迈尔斯-布里格斯类型指标个性,再借助预测的个性辅助多模态推理。实验结果显示,该方法能在不依赖个性标注的情况下准确预测个性,并取得令人满意的多模态推理性能。