Mapping the Challenges of HCI: An Application and Evaluation of ChatGPT and GPT-4 for Mining Insights at Scale

Large language models (LLMs), such as ChatGPT and GPT-4, are gaining wide-spread real world use. Yet, these LLMs are closed source, and little is known about their performance in real-world use cases. In academia, LLM performance is often measured on benchmarks which may have leaked into the LLM's training data. We apply and evaluate ChatGPT and GPT-4 for the real-world task of cost-efficiently extracting insights from a text corpus published after the LLMs were trained. We extract 4,392 research challenges in over 90 topics from the 2023 CHI conference proceedings and visualize the research challenges for interactive exploration. We critically evaluate the LLMs on this practical task and conclude that the combination of ChatGPT and GPT-4 makes an excellent cost-efficient means for analyzing a corpus at scale. Cost-efficiency is key for prototyping research ideas and analyzing text corpora from different perspectives, with implications for applying LLMs in academia and practice.

翻译：大型语言模型（LLMs），如ChatGPT和GPT-4，正获得广泛的实际应用。然而，这些LLMs是闭源的，且其在真实场景中的表现鲜为人知。在学术界，LLMs的性能通常通过基准测试来评估，但这些测试数据可能已泄露至LLM的训练集中。我们针对一项实际任务——在LLM训练完成之后发表的文本语料库中，以高成本效益的方式提取洞见——应用并评估了ChatGPT与GPT-4。我们从2023年CHI会议论文集中提取了涵盖90多个主题的4,392个研究挑战，并将这些研究挑战可视化以供交互式探索。我们对该实践任务中的LLMs进行了严格评估，得出结论：ChatGPT与GPT-4的组合可作为规模化分析语料库的卓越且高成本效益的手段。成本效益对于原型化研究思路及从不同角度分析文本语料库至关重要，这对LLMs在学术与实践中的应用具有重要意义。

相关内容

GPT-4

关注 29

北京时间2023年3月15日凌晨，ChatGPT开发商OpenAI 发布了发布了全新的多模态预训练大模型 GPT-4，可以更可靠、更具创造力、能处理更细节的指令，根据图片和文字提示都能生成相应内容。具体来说来说，GPT-4 相比上一代的模型，实现了飞跃式提升：支持图像和文本输入，拥有强大的识图能力；大幅提升了文字输入限制，在ChatGPT模式下，GPT-4可以处理超过2.5万字的文本，可以处理一些更加细节的指令；回答准确性也得到了显著提高。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日