Large Language Models (LLMs) are integral to applications such as conversational agents and content creation, where precise control over a model's personality is essential for maintaining tone, consistency, and user engagement. However, prevailing prompt-based or fine-tuning approaches either lack robustness or demand large-scale training data, making them costly and impractical. In this paper, we present PALETTE (Personality Adjustment by LLM SElf-TargeTed quEries), a novel method for personality editing in LLMs. Our approach introduces adjustment queries, where self-referential statements grounded in psychological constructs are treated analogously to factual knowledge, enabling direct editing of personality-related responses. Unlike fine-tuning, PALETTE requires only 12 editing samples to achieve substantial improvements in personality alignment across personality dimensions. Experimental results from both automatic and human evaluations demonstrate that our method enables more stable and well-balanced personality control in LLMs.
翻译:大型语言模型(LLM)在对话代理和内容生成等应用中具有核心地位,在这些场景中,对模型人格的精确控制对于保持语调一致性、内容连贯性和用户参与度至关重要。然而,当前主流的基于提示或微调的方法要么缺乏鲁棒性,要么需要大规模训练数据,导致其成本高昂且实用性有限。本文提出PALETTE(基于LLM自目标查询的人格调整方法),一种用于LLM人格编辑的新方法。我们的方法引入了调整查询,将基于心理学构念的自指陈述类比为事实知识进行处理,从而实现对人格相关响应的直接编辑。与微调方法不同,PALETTE仅需12个编辑样本即可在多个性格维度上实现人格对齐的显著提升。自动评估与人工评估的实验结果表明,我们的方法能够为LLM提供更稳定、更均衡的人格控制能力。