With the widespread deployment of Multimodal Large Language Models (MLLMs) in social interaction, understanding and controlling their behavior under complex personality conditions is essential. This paper introduces explicit personality conditioning and establishes a systematic evaluation framework encompassing single-personality induction, multi-personality induction, and personality switching. Experiments show that personality induction improves image captioning performance but can impair performance on tasks requiring precise reasoning, such as visual question answering (VQA). Balancing and residual effects are observed during multi-trait composition and dynamic switching, indicating that model behavior is co-modulated by both previous and current personality constraints. Existing prompt-based personality induction methods show limited transferability to multimodal settings. Our work reveals the dynamic and complex nature of personality modeling in MLLMs and underscores the need for robust, tailored methods for personality induction and evaluation. The code will be released when the paper is accepted.
翻译:随着多模态大语言模型(MLLMs)在社交互动中的广泛部署,理解并控制其在复杂人格条件下的行为至关重要。本文引入显式人格条件化,并构建了一个涵盖单一人格诱导、多重人格诱导及人格切换的系统性评估框架。实验表明,人格诱导虽能提升图像描述性能,但会损害需要精确推理的任务(如视觉问答)的表现。在多属性组合与动态切换过程中,观察到权衡效应与残余效应,表明模型行为受先前及当前人格约束的共同调节。现有基于提示的人格诱导方法在多模态场景下的迁移能力有限。本研究揭示了MLLMs中人格建模的动态性与复杂性,并强调了针对人格诱导与评估开发稳健、定制化方法的必要性。相关代码将在论文被接收后公开。