Examining the Potential and Pitfalls of ChatGPT in Science and Engineering Problem-Solving

The study explores the capabilities of OpenAI's ChatGPT in solving different types of physics problems. ChatGPT (with GPT-4) was queried to solve a total of 40 problems from a college-level engineering physics course. These problems ranged from well-specified problems, where all data required for solving the problem was provided, to under-specified, real-world problems where not all necessary data were given. Our findings show that ChatGPT could successfully solve 62.5\% of the well-specified problems, but its accuracy drops to 8.3\% for under-specified problems. Analysis of the model's incorrect solutions revealed three distinct failure modes: 1) failure to construct accurate models of the physical world, 2) failure to make reasonable assumptions about missing data, and 3) calculation errors. The study offers implications for how to leverage LLM-augmented instructional materials to enhance STEM education. The insights also contribute to the broader discourse on AI's strengths and limitations, serving both educators aiming to leverage the technology and researchers investigating human-AI collaboration frameworks for problem-solving and decision-making.

翻译：本研究探讨了OpenAI的ChatGPT在解决不同类型物理问题方面的能力。我们利用ChatGPT（基于GPT-4）处理了大学工程物理课程中的40个问题，这些问题涵盖从所有必要数据均已提供的明确指定型问题，到未提供全部数据的欠指定型真实世界问题。研究结果显示，ChatGPT能成功解决62.5%的明确指定型问题，但在欠指定型问题中准确率降至8.3%。通过分析模型错误解答，我们识别出三种典型失效模式：1) 未能构建准确的物理世界模型；2) 未能对缺失数据做出合理假设；3) 计算错误。本研究为如何利用大语言模型增强型教学材料提升STEM教育提供了启示，同时也有助于深化关于人工智能优缺点的广泛讨论，既服务于希望利用该技术的教育工作者，也为探究人机协作框架以解决问题和决策的研究者提供参考。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日