Large language models (LLMs), such as ChatGPT and GPT-4, are gaining wide-spread real world use. Yet, these LLMs are closed source, and little is known about their performance in real-world use cases. In academia, LLM performance is often measured on benchmarks which may have leaked into the LLM's training data. We apply and evaluate ChatGPT and GPT-4 for the real-world task of cost-efficiently extracting insights from a text corpus published after the LLMs were trained. We extract 4,392 research challenges in over 90 topics from the 2023 CHI conference proceedings and visualize the research challenges for interactive exploration. We critically evaluate the LLMs on this practical task and conclude that the combination of ChatGPT and GPT-4 makes an excellent cost-efficient means for analyzing a corpus at scale. Cost-efficiency is key for prototyping research ideas and analyzing text corpora from different perspectives, with implications for applying LLMs in academia and practice.
翻译:大型语言模型(LLMs),如ChatGPT和GPT-4,正获得广泛的实际应用。然而,这些LLMs是闭源的,且其在真实场景中的表现鲜为人知。在学术界,LLMs的性能通常通过基准测试来评估,但这些测试数据可能已泄露至LLM的训练集中。我们针对一项实际任务——在LLM训练完成之后发表的文本语料库中,以高成本效益的方式提取洞见——应用并评估了ChatGPT与GPT-4。我们从2023年CHI会议论文集中提取了涵盖90多个主题的4,392个研究挑战,并将这些研究挑战可视化以供交互式探索。我们对该实践任务中的LLMs进行了严格评估,得出结论:ChatGPT与GPT-4的组合可作为规模化分析语料库的卓越且高成本效益的手段。成本效益对于原型化研究思路及从不同角度分析文本语料库至关重要,这对LLMs在学术与实践中的应用具有重要意义。