The Wizard of Oz (WoZ) method is a widely adopted research approach where a human Wizard ``role-plays'' a not readily available technology and interacts with participants to elicit user behaviors and probe the design space. With the growing ability for modern large language models (LLMs) to role-play, one can apply LLMs as Wizards in WoZ experiments with better scalability and lower cost than the traditional approach. However, methodological guidance on responsibly applying LLMs in WoZ experiments and a systematic evaluation of LLMs' role-playing ability are lacking. Through two LLM-powered WoZ studies, we take the first step towards identifying an experiment lifecycle for researchers to safely integrate LLMs into WoZ experiments and interpret data generated from settings that involve Wizards role-played by LLMs. We also contribute a heuristic-based evaluation framework that allows the estimation of LLMs' role-playing ability in WoZ experiments and reveals LLMs' behavior patterns at scale.
翻译:奥兹巫师(Wizard of Oz,WoZ)方法是一种被广泛采用的研究范式,其中人类“巫师”通过“角色扮演”来模拟尚未成熟可用的技术,并与参与者互动以引发用户行为并探索设计空间。随着现代大型语言模型(LLM)角色扮演能力的日益增强,研究者可以应用LLM作为WoZ实验中的巫师,相比传统方法具有更好的可扩展性和更低的成本。然而,目前尚缺乏关于如何在WoZ实验中负责任地应用LLM的方法论指导,以及对LLM角色扮演能力的系统性评估。通过两项基于LLM的WoZ研究,我们迈出了第一步:为研究人员识别出一个实验生命周期,以安全地将LLM整合到WoZ实验中,并合理解释由LLM扮演巫师的实验设置所产生的数据。我们还提出了一个基于启发式的评估框架,该框架能够评估LLM在WoZ实验中的角色扮演能力,并大规模揭示LLM的行为模式。