Closing the Auto-Research Loop: An AI Co-Scientist for Production Search Ranking

We present an AI Co-Scientist framework that closes the research loop for the production search-ranking system of a large online travel platform -- pairing LLM agents with direct cloud-compute access so that idea generation, code implementation, GPU experimentation, and result analysis iterate end-to-end with a human scientist in the loop. The framework uses a hybrid agent architecture: single-LLM agents handle routine work, while multi-LLM consensus (GPT-5.2, Gemini Pro 3, Claude Opus 4.5) is invoked for higher-stakes decisions. On the production ranking task, a human-designed transformer baseline (V2) yielded $+0.118\%$ over a pre-transformer baseline (V1); the AI Co-Scientist's automated loop on top of V2 contributed an additional $+0.083\%$, for a combined $+0.201\%$ offline gain delivered in roughly one extra week of wall-clock time (single-run numbers; statistical limits discussed in the paper). The most useful AI proposals -- unified long-sequence layouts, slot-type embeddings, and multi-phase learning-rate schedules -- are standard practice in NLP and Vision but were absent from our production stack, suggesting that LLM agents can serve as cross-disciplinary connectors for ranking teams. We also report deployment context, negative results, and lessons learned.

翻译：我们提出了一种AI协同科学家框架，实现了大型在线旅游平台生产搜索排名系统的研究闭环——该框架将LLM智能体与直接的云计算访问配对，使得思路生成、代码实现、GPU实验和结果分析能够在人机协同下实现端到端迭代。该框架采用混合智能体架构：单LLM智能体处理常规任务，而多LLM共识机制（GPT-5.2、Gemini Pro 3、Claude Opus 4.5）则用于更高风险决策。在生产排名任务中，人工设计的Transformer基线（V2）相比前Transformer基线（V1）带来了+0.118%的提升；AI协同科学家在V2基础上进行的自动化循环额外贡献了+0.083%的提升，最终在约一周额外挂钟时间（单次运行数值；论文中讨论了统计极限）内实现了总计+0.201%的离线收益。最有价值的AI提案——统一长序列布局、槽位类型嵌入和多阶段学习率调度方案——在自然语言处理和视觉领域已是标准实践，但此前并未出现在我们的生产系统中，这表明LLM智能体能够充当排名团队的跨学科连接器。我们还报告了部署环境、负面结果以及经验教训。

相关内容

关注 7111

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

AgentOps综述：智能体系统运维框架

专知会员服务

24+阅读 · 6月4日

AutoResearch AI综述：迈向AI驱动的科学发现自动化

专知会员服务

18+阅读 · 5月26日

Claw AI Lab：从自动写论文到交互式AI研究实验室

专知会员服务

15+阅读 · 5月24日

PaperOrchestra：一种面向自动化 AI 学术论文撰写的多智能体框架

专知会员服务

13+阅读 · 4月9日