Training large transformers using next-token prediction has given rise to groundbreaking advancements in AI. While this generative AI approach has produced impressive results, it heavily leans on human supervision. Even state-of-the-art AI models like ChatGPT depend on fine-tuning through human demonstrations, demanding extensive human input and domain expertise. This strong reliance on human oversight poses a significant hurdle to the advancement of AI innovation. To address this limitation, we propose a novel paradigm termed Exploratory AI (EAI) aimed at autonomously generating high-quality training data. Drawing inspiration from unsupervised reinforcement learning (RL) pretraining, EAI achieves exploration within the natural language space. We accomplish this by harnessing large language models to assess the novelty of generated content. Our approach employs two key components: an actor that generates novel content following exploration principles and a critic that evaluates the generated content, offering critiques to guide the actor. Empirical evaluations demonstrate that EAI significantly boosts model performance on complex reasoning tasks, addressing the limitations of human-intensive supervision.
翻译:使用下一个词元预测训练大型Transformer模型已催生人工智能领域的突破性进展。尽管这种生成式AI方法取得了令人瞩目的成果,但它严重依赖人类监督。即便是ChatGPT等最先进的AI模型,也需要通过人类演示进行微调,这要求大量人力投入和领域专业知识。这种对人类监督的强依赖性成为制约AI创新的重大障碍。为解决这一局限,我们提出了一种名为探索式AI(EAI)的新范式,旨在自主生成高质量训练数据。受无监督强化学习预训练的启发,EAI在自然语言空间中实现探索。我们通过利用大型语言模型评估生成内容的新颖性实现这一目标。该方法采用两个关键组件:遵循探索原则生成新颖内容的行动者,以及评估生成内容并提供指导性批评的评判者。经验评估表明,EAI可显著提升模型在复杂推理任务上的性能,有效弥补人类密集型监督的局限性。