The study explores the capabilities of OpenAI's ChatGPT in solving different types of physics problems. ChatGPT (with GPT-4) was queried to solve a total of 40 problems from a college-level engineering physics course. These problems ranged from well-specified problems, where all data required for solving the problem was provided, to under-specified, real-world problems where not all necessary data were given. Our findings show that ChatGPT could successfully solve 62.5% of the well-specified problems, but its accuracy drops to 8.3% for under-specified problems. Analysis of the model's incorrect solutions revealed three distinct failure modes: 1) failure to construct accurate models of the physical world, 2) failure to make reasonable assumptions about missing data, and 3) calculation errors. The study offers implications for how to leverage LLM-augmented instructional materials to enhance STEM education. The insights also contribute to the broader discourse on AI's strengths and limitations, serving both educators aiming to leverage the technology and researchers investigating human-AI collaboration frameworks for problem-solving and decision-making.
翻译:本研究探究了OpenAI的ChatGPT在解决不同类型物理问题中的能力。我们向ChatGPT(采用GPT-4)提出了来自大学工程物理课程共40个问题。这些问题范围涵盖所有求解所需数据均已提供的明确问题,以及并非所有必要数据均给出的欠明确现实问题。研究结果显示,ChatGPT能成功解决62.5%的明确问题,但在解决欠明确问题时准确率降至8.3%。对模型错误解的分析揭示了三种不同的失败模式:1)未能构建精确的物理世界模型,2)未能对缺失数据做出合理假设,以及3)计算错误。本研究为如何利用大语言模型增强型教学材料来提升STEM教育提供了启示。这些见解亦有助于更广泛地探讨人工智能的优势与局限,既服务于旨在利用该技术的教育工作者,也惠及研究人类-人工智能协作框架以解决与决策的研究人员。