There has been increasing interest in investigating the behaviours of large language models (LLMs) and LLM-powered chatbots by treating an LLM as a participant in a psychological experiment. We therefore developed an R package called "MacBehaviour" that aims to interact with more than 60 language models in one package (e.g., OpenAI's GPT family, the Claude family, Gemini, Llama family, and open-source models) and streamline the experimental process of LLMs behaviour experiments. The package offers a comprehensive set of functions designed for LLM experiments, covering experiment design, stimuli presentation, model behaviour manipulation, logging response and token probability. To demonstrate the utility and effectiveness of "MacBehaviour," we conducted three validation experiments on three LLMs (GPT-3.5, Llama-2 7B, and Vicuna-1.5 13B) to replicate sound-gender association in LLMs. The results consistently showed that they exhibit human-like tendencies to infer gender from novel personal names based on their phonology, as previously demonstrated (Cai et al., 2023). In summary, "MacBehaviour" is an R package for machine behaviour studies which offers a user-friendly interface and comprehensive features to simplify and standardize the experimental process.
翻译:近年来,将大语言模型(LLMs)及基于LLM的聊天机器人作为心理实验参与者以研究其行为特征的研究日益增多。为此,我们开发了名为"MacBehaviour"的R包,该包可同时与60余种语言模型交互(例如OpenAI的GPT系列、Claude系列、Gemini、Llama系列及开源模型),并简化LLM行为实验的流程。该包提供了一套专为LLM实验设计的综合函数,涵盖实验设计、刺激呈现、模型行为操控、响应记录及词元概率分析。为验证"MacBehaviour"的实用性与有效性,我们基于三个LLM(GPT-3.5、Llama-2 7B和Vicuna-1.5 13B)开展了三项验证实验,以复现LLM中的声音-性别关联现象。实验结果一致表明,这些模型表现出与人类相似的倾向——即根据语音特征推断新造人名的性别,这与先前研究(Cai等,2023)的发现一致。总之,"MacBehaviour"是一个面向机器行为研究的R包,其通过用户友好界面和全面功能,简化并标准化了实验流程。