This work conducts a comprehensive exploration into the proficiency of OpenAI's ChatGPT-4 in sourcing scientific references within an array of research disciplines. Our in-depth analysis encompasses a wide scope of fields including Computer Science (CS), Mechanical Engineering (ME), Electrical Engineering (EE), Biomedical Engineering (BME), and Medicine, as well as their more specialized sub-domains. Our empirical findings indicate a significant variance in ChatGPT-4's performance across these disciplines. Notably, the validity rate of suggested articles in CS, BME, and Medicine surpasses 65%, whereas in the realms of ME and EE, the model fails to verify any article as valid. Further, in the context of retrieving articles pertinent to niche research topics, ChatGPT-4 tends to yield references that align with the broader thematic areas as opposed to the narrowly defined topics of interest. This observed disparity underscores the pronounced variability in accuracy across diverse research fields, indicating the potential requirement for model refinement to enhance its functionality in academic research. Our investigation offers valuable insights into the current capacities and limitations of AI-powered tools in scholarly research, thereby emphasizing the indispensable role of human oversight and rigorous validation in leveraging such models for academic pursuits.
翻译:本研究系统探究了OpenAI的ChatGPT-4在多个研究学科中获取科学参考文献的能力。我们的深度分析涵盖了计算机科学、机械工程、电气工程、生物医学工程和医学等广泛领域,及其更细分的专业子领域。实证结果表明,ChatGPT-4在不同学科间的表现存在显著差异。值得注意的是,在计算机科学、生物医学工程和医学领域,其建议文章的有效率超过65%,而在机械工程和电气工程领域,模型未能验证任何一篇文章为有效。此外,在检索与细分研究主题相关的文章时,ChatGPT-4倾向于提供符合更广泛主题领域的参考文献,而非精准对应狭窄定义的研究主题。这一观测差异凸显了不同研究领域之间准确性的显著差异性,表明可能需要优化模型以增强其在学术研究中的功能性。本研究为人工智能工具在学术研究中的现有能力与局限性提供了宝贵见解,从而强调了在利用此类模型进行学术探索时,人类监督与严格验证不可或缺的作用。