Our work demonstrates that large language model (LLM) pre-trained on texts can not only solve pure math word problems, but also physics word problems-problems to be solved by calculation and inference based on some prior physical knowledge. We collect and annotate the first physics word problem dataset-PhysQA, which contains over 1000 junior high school physics word problems (on Kinematics, Mass&Density, Mechanics, Heat, Electricity). Then we use OpenAI' s GPT3.5 to generate the answer of these problems and found that GPT3.5 could automatically solve 49.3% of the problems on zero-shot learning and 73.2% on few-shot learning. This result show that by using similar problem and its answer as prompt, LLM could solve elementary physics word problems approaching human level. Besides automatically solving problems, GPT3.5 could also summarize the knowledge or topic examined by the problem, generate the relevant explanation, and synthesis new physics word problems according tothe input problems.Our work is the first research on automatically solving, explaining and generating physics word problems of multiple types and scenes, and we gain an acceptable and state-of-art accuracy, which demonstrates the potential of LLM's further application in the field of secondary education.
翻译:我们的工作表明,在文本上预训练的大语言模型不仅能够解决纯数学文字题,还能解决物理文字题——即需要基于先验物理知识进行计算和推理才能解决的问题。我们收集并标注了首个物理文字题数据集PhysQA,包含1000余道初中物理文字题(涉及运动学、质量与密度、力学、热学、电学)。随后,我们使用OpenAI的GPT3.5生成这些问题的答案,发现GPT3.5在零样本学习下能自动解决49.3%的问题,在少样本学习下能解决73.2%的问题。这一结果表明,通过将类似问题及其答案作为提示,大语言模型能够以接近人类的水平解决基础物理文字题。除自动解题外,GPT3.5还能总结题目所考查的知识点或主题,生成相关解释,并根据输入问题合成新的物理文字题。我们的工作是首个对多种类型与场景的物理文字题进行自动求解、解释与生成的研究,且获得了可接受的、最先进的准确率,这揭示了大语言模型在中等教育领域进一步应用的潜力。