In this work, we report on the effectiveness of our efforts to tailor the personality and conversational style of a conversational agent based on GPT-3.5 and GPT-4 through prompts. We use three personality dimensions with two levels each to create eight conversational agents archetypes. Ten conversations were collected per chatbot, of ten exchanges each, generating 1600 exchanges across GPT-3.5 and GPT-4. Using Linguistic Inquiry and Word Count (LIWC) analysis, we compared the eight agents on language elements including clout, authenticity, and emotion. Four language cues were significantly distinguishing in GPT-3.5, while twelve were distinguishing in GPT-4. With thirteen out of a total nineteen cues in LIWC appearing as significantly distinguishing, our results suggest possible novel prompting approaches may be needed to better suit the creation and evaluation of persistent conversational agent personalities or language styles.
翻译:本文报告了基于GPT-3.5与GPT-4的对话代理中,通过提示(prompts)定制其人格及对话风格的效果评估。我们采用两个层级的三维人格模型,构建了八种对话代理原型。每个聊天机器人采集十轮对话(每轮含十次交互),在GPT-3.5与GPT-4平台上共生成1600次交互。通过语言查询与词频分析(LIWC),我们从影响力(clout)、真实性(authenticity)及情感(emotion)等语言要素维度对八种代理进行对比。在GPT-3.5中,四项语言线索呈现显著区分度;而在GPT-4中,十二项线索具有显著区分性。LIWC全部十九项线索中,十三项呈现显著区分度,表明可能需探索新型提示方法,以更适配持久型对话代理人格或语言风格的构建与评估。