Large language models like GPT have proven widely successful on natural language understanding tasks based on written text documents. In this paper, we investigate an LLM's performance on recordings of a group oral communication task in which utterances are often truncated or not well-formed. We propose a new group task experiment involving a puzzle with several milestones that can be achieved in any order. We investigate methods for processing transcripts to detect if, when, and by whom a milestone has been completed. We demonstrate that iteratively prompting GPT with transcription chunks outperforms semantic similarity search methods using text embeddings, and further discuss the quality and randomness of GPT responses under different context window sizes.
翻译:诸如GPT等大型语言模型已在基于书面文本的自然语言理解任务中展现出广泛成功。本文研究了LLM在群体口语交流任务录音上的表现,此类任务中的话语常存在截断或结构不规范的情况。我们设计了一项新的群体任务实验,该实验包含可任意顺序达成的多个里程碑谜题。我们探究了通过处理转录文本来检测里程碑是否完成、何时完成及由谁完成的方法。实验证明,采用转录文本块对GPT进行迭代提示的方法,优于基于文本嵌入的语义相似性搜索方法;本文进一步探讨了不同上下文窗口大小下GPT响应的质量与随机性特征。