ChatGPT is a powerful language model from OpenAI that is arguably able to comprehend and generate text. ChatGPT is expected to greatly impact society, research, and education. An essential step to understand ChatGPT's expected impact is to study its domain-specific answering capabilities. Here, we perform a systematic empirical assessment of its abilities to answer questions across the natural science and engineering domains. We collected 594 questions on natural science and engineering topics from 198 faculty members across five faculties at Delft University of Technology. After collecting the answers from ChatGPT, the participants assessed the quality of the answers using a systematic scheme. Our results show that the answers from ChatGPT are, on average, perceived as ''mostly correct''. Two major trends are that the rating of the ChatGPT answers significantly decreases (i) as the educational level of the question increases and (ii) as we evaluate skills beyond scientific knowledge, e.g., critical attitude.
翻译:ChatGPT是OpenAI开发的一种强大语言模型,据称能够理解和生成文本。该模型预计将对社会、研究和教育产生重大影响。理解ChatGPT预期影响的关键步骤在于研究其在具体领域的问答能力。本文对ChatGPT在自然科学与工程领域的问答能力进行了系统性实证评估。我们收集了来自代尔夫特理工大学五个学院的198名教师提出的594个自然科学与工程领域问题。在获取ChatGPT的回答后,参与者采用系统化评价方案对回答质量进行评估。结果表明,ChatGPT的回答平均被认定为"基本正确"。两个主要趋势为:(i) 随着问题教育水平的提高,ChatGPT回答的评分显著下降;(ii) 当评估超越科学知识的能力(如批判性思维)时,其评分同样显著下降。