The advancement of Large Language Models (LLMs) has led to their widespread use across a broad spectrum of tasks including decision making. Prior studies have compared the decision making abilities of LLMs with those of humans from a psychological perspective. However, these studies have not always properly accounted for the sensitivity of LLMs' behavior to hyperparameters and variations in the prompt. In this study, we examine LLMs' performance on the Horizon decision making task studied by Binz and Schulz (2023) analyzing how LLMs respond to variations in prompts and hyperparameters. By experimenting on three OpenAI language models possessing different capabilities, we observe that the decision making abilities fluctuate based on the input prompts and temperature settings. Contrary to previous findings language models display a human-like exploration exploitation tradeoff after simple adjustments to the prompt.
翻译:大语言模型(LLMs)的进步使其广泛应用于包括决策在内的广泛任务中。已有研究从心理学角度比较了LLMs与人类的决策能力,但这些研究并未充分考虑超参数和提示变化对LLM行为敏感性的影响。本研究聚焦于Binz与Schulz(2023)提出的Horizon决策任务,通过分析LLMs对提示和超参数变化的响应,考察其表现。我们在三种具有不同能力的OpenAI语言模型上进行实验,观察到决策能力随输入提示和温度设置的不同而波动。与以往发现相反,经过简单的提示调整后,语言模型展现出类似人类探索-利用权衡的行为模式。