With the widespread deployment of Multimodal Large Language Models (MLLMs) in social interaction, understanding and controlling their behavior under complex personality conditions is essential. This paper introduces explicit personality conditioning and establishes a systematic evaluation framework encompassing single-personality induction, multi-personality induction, and personality switching. Experiments show that personality induction improves image captioning performance but can impair performance on tasks requiring precise reasoning, such as visual question answering (VQA). Balancing and residual effects are observed during multi-trait composition and dynamic switching, indicating that model behavior is co-modulated by both previous and current personality constraints. Existing prompt-based personality induction methods show limited transferability to multimodal settings. Our work reveals the dynamic and complex nature of personality modeling in MLLMs and underscores the need for robust, tailored methods for personality induction and evaluation. The code will be released when the paper is accepted.
翻译:随着多模态大语言模型(MLLMs)在社交交互中的广泛应用,理解并控制其在复杂人格条件下的行为至关重要。本文引入显式人格条件设定,并构建了一个系统化评估框架,涵盖单人格诱导、多人格诱导及人格切换三类场景。实验表明,人格诱导能提升图像描述生成性能,但在需要精确推理的任务(如视觉问答)中可能导致性能下降。在多特质组合与动态切换过程中,观察到平衡效应和残留效应,表明模型行为受先前与当前人格约束的共同调节。现有基于提示的人格诱导方法在多模态环境下迁移能力有限。本工作揭示了MLLMs中人格建模的动态性与复杂性,并强调需要开发针对性的稳健方法用于人格诱导与评估。相关代码将在论文被接收后开源。