Advancements in Large Language Model (LLM) Role-Playing Agents have focused on various construction methodologies, yet it remains unclear which aspects of character profiles genuinely drive role-playing quality. To bridge this gap, we introduce a systematic diagnostic framework that disentangles the impact of character profiles along three axes: Familiarity (Known vs. Unknown), Structure (Structured vs. Unstructured), and Disposition (Moral vs. Immoral). To investigate these axes, we design a unified hierarchical schema (5 dimensions, 28 fields) standardizing character attributes and construct a controlled dataset of 211 personas varying along these three axes. We evaluate five LLMs on single and multi-turn benchmarks. Our results reveal a striking asymmetry: Familiarity and Structure show negligible impact, while Valence produces large, consistent performance degradation for immoral characters across all conditions. This performance drop concentrates in motivation-related attributes, indicating that alignment priors actively suppress tokens needed for faithful immoral portrayal. To mitigate this alignment-induced bottleneck, we propose Field-Aware Contrastive Decoding (FACD), a training-free strategy that selectively amplizes suppressed immoral-field signals, significantly reducing the Moral-Immoral performance gap without sacrificing moral-character performance.
翻译:大语言模型角色扮演智能体的研究进展集中在各类构建方法上,但角色特征中哪些维度真正影响角色扮演质量仍不明确。为填补这一空白,我们提出一个系统化的诊断框架,沿三个特征轴解耦角色特征的影响:熟悉度(已知 vs. 未知)、结构(结构化 vs. 非结构化)和倾向性(道德 vs. 不道德)。为研究这些轴,我们设计了一个统一的分层模式(5个维度、28个字段)以标准化角色属性,并构建了一个沿这三个轴变化的211个角色的受控数据集。我们在单轮和多轮基准上评估了五个大语言模型。结果揭示了显著的不对称性:熟悉度和结构影响可忽略不计,而倾向性中不道德角色在所有条件下均导致一致且显著的性能下降。这种性能下降集中在动机相关属性上,表明对齐先验主动抑制了忠实刻画不道德角色所需的词元。为缓解这种对齐导致的瓶颈,我们提出字段感知对比解码(FACD),一种无需训练的策略,通过选择性放大被抑制的不道德角色字段信号,在不牺牲道德角色性能的前提下显著缩小道德-不道德角色间的性能差距。