The rapid evolution of large language models such as GPT-4 Turbo represents an impactful paradigm shift in digital interaction and content engagement. While these models encode vast amounts of human-generated knowledge and excel in processing diverse data types, recent research shows that they often face the challenge of accurately responding to specific user intents, leading to increased user dissatisfaction. Based on a fine-grained intent taxonomy and intent-based prompt reformulations, we analyze (1) the quality of intent recognition and (2) user satisfaction with answers from intent-based prompt reformulations for two recent ChatGPT models, GPT-3.5 Turbo and GPT-4 Turbo. The results reveal that GPT-4 outperforms GPT-3.5 on the recognition of common intents, but is conversely often outperformed by GPT-3.5 on the recognition of less frequent intents. Moreover, whenever the user intent is correctly recognized, while users are more satisfied with the answers to intent-based reformulations of GPT 4 compared to GPT-3.5, they tend to be more satisfied with the answers of the models to their original prompts compared to the reformulated ones. Finally, the study indicates that users can quickly learn to formulate their prompts more effectively, once they are shown possible reformulation templates.
翻译:大型语言模型(如GPT-4 Turbo)的快速发展标志着数字交互与内容处理领域的重要范式转变。尽管这些模型编码了海量人类知识且在多种数据类型处理中表现卓越,但最新研究表明,它们在精准回应用户特定意图方面常面临挑战,导致用户满意度下降。本研究基于细粒度意图分类体系与意图驱动的提示重构策略,针对GPT-3.5 Turbo与GPT-4 Turbo两个最新ChatGPT模型,系统分析了以下两个方面:(1)意图识别的质量;(2)用户对意图重构提示生成答案的满意度。结果表明:GPT-4在常见意图识别上显著优于GPT-3.5,但在低频意图识别上反而常落后于GPT-3.5。此外,当用户意图被正确识别时,尽管用户对GPT-4基于意图重构的答案满意度高于GPT-3.5,但其对原始提示的答案满意度普遍高于重构提示。最后,研究揭示用户一旦接触可用的重构模板,便能快速学会更有效地编写提示。