The way and content in which users ask questions can provide insight into their current status, including their personality, emotions, and psychology. Instead of directly prompting the large language models (LLMs), we explore how chain-of-thought prompting helps in this scenario to perform reasoning and planning according to user status, aiming to provide a more personalized and engaging experience for the user query. To this end, we first construct a benchmark of 6 dialogue or question-answering datasets in both English and Chinese, covering 3 different aspects of user status (\textit{including} \textit{personality}, \textit{emotion}, and \textit{psychology}). Then we prompt the LLMs to generate the response regarding the user status as intermediate reasoning processing. We propose a novel demonstration selection strategy using the semantic similarity of intermediate reasoning instead of test queries. To evaluate the effectiveness and robustness of our approach, we conduct extensive experiments with 7 LLMs under zero-shot and one-shot settings. The experimental results show that our approach consistently outperforms standard prompting in terms of both \textit{helpfulness} and \textit{acceptness} across all datasets, regardless of the LLMs used. The code and dataset can be found at \url{https://github.com/ruleGreen/Dialogue\_CoT.git}.
翻译:用户提问的方式和内容可以揭示其当前状态,包括性格、情感和心理。不同于直接提示大语言模型,本研究探索思维链提示在该场景下如何根据用户状态进行推理与规划,旨在为用户查询提供更个性化和更具吸引力的体验。为此,我们首先构建了一个涵盖6个对话或问答数据集的基准测试,包含英文和中文,覆盖用户状态的3个不同方面(包括性格、情感和心理)。接着,我们提示大语言模型将用户状态作为中间推理过程来生成响应。我们提出一种新颖的中间推理语义相似性示范选择策略,而非基于测试查询的示范选择。为评估方法的有效性和鲁棒性,我们在零样本和单样本设置下使用7种大语言模型进行了大量实验。实验结果表明,无论使用何种大语言模型,我们的方法在所有数据集上均优于标准提示,在“有用性”和“接纳度”方面均表现一致。代码和数据集可在 \url{https://github.com/ruleGreen/Dialogue\_CoT.git} 获取。