Prompting serves as the major way humans interact with Large Language Models (LLM). Commercial AI systems commonly define the role of the LLM in system prompts. For example, ChatGPT uses "You are a helpful assistant" as part of the default system prompt. But is "a helpful assistant" the best role for LLMs? In this study, we present a systematic evaluation of how social roles in system prompts affect model performance. We curate a list of 162 roles covering 6 types of interpersonal relationships and 8 types of occupations. Through extensive analysis of 3 popular LLMs and 2457 questions, we show that adding interpersonal roles in prompts consistently improves the models' performance over a range of questions. Moreover, while we find that using gender-neutral roles and specifying the role as the audience leads to better performances, predicting which role leads to the best performance remains a challenging task, and that frequency, similarity, and perplexity do not fully explain the effect of social roles on model performances. Our results can help inform the design of system prompts for AI systems. Code and data are available at https://github.com/Jiaxin-Pei/Prompting-with-Social-Roles.
翻译:提示(Prompting)是人类与大语言模型(LLM)交互的主要方式。商业AI系统通常在系统提示中定义LLM的角色。例如,ChatGPT将“你是一个有帮助的助手”作为默认系统提示的一部分。但“有帮助的助手”是LLM的最佳角色吗?在本研究中,我们对系统提示中社会角色如何影响模型性能进行了系统评估。我们整理了一份包含162个角色的列表,涵盖6种人际关系类型和8种职业类型。通过对3个流行LLM和2457个问题的广泛分析,我们发现,在提示中添加人际关系角色能持续提升模型在多个问题上的性能。此外,尽管我们发现使用中性性别角色和将角色指定为受众能带来更好的性能,但预测哪个角色能带来最佳性能仍是一项具有挑战性的任务,并且频率、相似性和困惑度并不能完全解释社会角色对模型性能的影响。我们的结果有助于为AI系统的系统提示设计提供参考。代码和数据可在https://github.com/Jiaxin-Pei/Prompting-with-Social-Roles获取。