Personality is a complex, hierarchical construct typically assessed through item-level questionnaires aggregated into broad trait scores. Personality recognition models aim to infer personality traits from different sources of behavioral data. However, reliance on broad trait scores as ground truth, combined with limited training data, poses challenges for generalization, as similar trait scores can manifest through diverse, context dependent behaviors. In this work, we explore the predictive impact of the more granular hierarchical levels of the Big-Five Personality Model, facets and nuances, to enhance personality recognition from audiovisual interaction data. Using the UDIVA v0.5 dataset, we trained a transformer-based model including cross-modal (audiovisual) and cross-subject (dyad-aware) attention mechanisms. Results show that nuance-level models consistently outperform facet and trait-level models, reducing mean squared error by up to 74% across interaction scenarios.
翻译:人格是一种复杂的层级结构,通常通过项目级问卷进行评估,并汇总为广泛的特质分数。人格识别模型旨在从不同行为数据源推断人格特质。然而,依赖宽泛的特质分数作为真实标签,加之有限的训练数据,对模型泛化提出了挑战,因为相似的特质分数可能通过多样且依赖情境的行为表现出来。本研究探索了五大人格模型中更细粒度的层级——层面与细微特征——对预测效能的影响,以提升基于视听交互数据的人格识别。利用UDIVA v0.5数据集,我们训练了一个基于Transformer的模型,该模型包含跨模态(视听)与跨主体(成对感知)注意力机制。结果表明,细微特征层级的模型在各类交互场景中均持续优于层面与特质层级模型,将均方误差降低了最高达74%。