ChatGPT is a powerful language model from OpenAI that is arguably able to comprehend and generate text. ChatGPT is expected to have a large impact on society, research, and education. An essential step to understand ChatGPT's expected impact is to study its domain-specific answering capabilities. Here, we perform a systematic empirical assessment of its abilities to answer questions across the natural science and engineering domains. We collected 594 questions from 198 faculty members across 5 faculties at Delft University of Technology. After collecting the answers from ChatGPT, the participants assessed the quality of the answers using a systematic scheme. Our results show that the answers from ChatGPT are on average perceived as ``mostly correct''. Two major trends are that the rating of the ChatGPT answers significantly decreases (i) as the complexity level of the question increases and (ii) as we evaluate skills beyond scientific knowledge, e.g., critical attitude.
翻译:ChatGPT是OpenAI开发的一款强大语言模型,据称能够理解和生成文本。预计ChatGPT将对社会、研究和教育产生重大影响。理解ChatGPT预期影响的关键步骤之一是研究其针对特定领域的问答能力。本文对其在自然科学与工程领域的问题回答能力进行了系统性的实证评估。我们收集了来自代尔夫特理工大学5个学院的198名教职员工提出的594个问题。在获取ChatGPT生成的答案后,参与者采用系统化评估方案对答案质量进行了评判。结果表明,ChatGPT的答案平均被认为"基本正确"。两个主要趋势是:(i)随着问题复杂程度的增加,ChatGPT答案的评分显著下降;(ii)当评估超出科学知识范畴(如批判性态度)的技能时,其评分同样显著降低。