This study investigates the efficacy of using multimodal machine learning techniques to detect deception in dyadic interactions, focusing on the integration of data from both the deceiver and the deceived. We compare early and late fusion approaches, utilizing audio and video data - specifically, Action Units and gaze information - across all possible combinations of modalities and participants. Our dataset, newly collected from Swedish native speakers engaged in truth or lie scenarios on emotionally relevant topics, serves as the basis for our analysis. The results demonstrate that incorporating both speech and facial information yields superior performance compared to single-modality approaches. Moreover, including data from both participants significantly enhances deception detection accuracy, with the best performance (71%) achieved using a late fusion strategy applied to both modalities and participants. These findings align with psychological theories suggesting differential control of facial and vocal expressions during initial interactions. As the first study of its kind on a Scandinavian cohort, this research lays the groundwork for future investigations into dyadic interactions, particularly within psychotherapy settings.
翻译:本研究探讨了利用多模态机器学习技术检测二元互动中欺骗行为的有效性,重点关注整合欺骗者与被欺骗者双方的数据。我们比较了早期融合与晚期融合方法,利用音频和视频数据——特别是动作单元与注视信息——在所有可能的模态组合与参与者组合中进行实验。我们的数据集首次采集自以瑞典语为母语的人群,他们在涉及情感相关话题的真实或谎言情境中进行互动,该数据集构成了我们分析的基础。结果表明,结合语音与面部信息的方法相较于单一模态方法表现出更优的性能。此外,纳入双方参与者的数据显著提升了欺骗检测的准确率,最佳性能(71%)通过将晚期融合策略应用于所有模态与参与者而实现。这些发现与心理学理论相一致,该理论认为在初始互动中面部与声音表达受到差异性调控。作为针对斯堪的纳维亚人群的首项此类研究,本工作为未来探究二元互动——特别是在心理治疗场景中的应用——奠定了基础。