Research automation efforts usually employ AI as a tool to automate specific tasks within the research process. To create an AI that truly conduct research themselves, it must independently generate hypotheses, design verification plans, and execute verification. Therefore, we investigated if an AI itself could autonomously generate and verify hypothesis for a toy machine learning research problem. We prompted GPT-4 to generate hypotheses and Python code for hypothesis verification with limited methodological guidance. Our findings suggest that, in some instances, GPT-4 can autonomously generate and validate hypotheses without detailed guidance. While this is a promising result, we also found that none of the verifications were flawless, and there remain significant challenges in achieving autonomous, human-level research using only generic instructions. These findings underscore the need for continued exploration to develop a general and autonomous AI researcher.
翻译:研究自动化工作通常将人工智能作为工具,用于自动化研究过程中的特定任务。要创造真正自主开展研究的人工智能,其必须独立生成假说、设计验证方案并执行验证。因此,我们探究了人工智能本身能否针对一个玩具级机器学习研究问题,自主生成并验证假说。我们在提供有限方法学引导的条件下,提示GPT-4生成假说及用于假说验证的Python代码。研究结果表明,在某些情况下,GPT-4无需详细指导即可自主生成并验证假说。尽管这一结果令人鼓舞,但我们同时发现所有验证均存在缺陷,且仅凭通用指令实现人类水平的自主研究仍面临重大挑战。这些发现凸显了持续探索通用型自主人工智能研究者的必要性。