AVO: Agentic Variation Operators for Autonomous Evolutionary Search

Terry Chen,Zhifan Ye,Bing Xu,Zihao Ye,Timmy Liu,Ali Hassani,Tianqi Chen,Andrew Kerr,Haicheng Wu,Yang Xu,Yu-Jung Chen,Hanfeng Chen,Aditya Kane,Ronny Krashinsky,Ming-Yu Liu,Vinod Grover,Luis Ceze,Roger Bringmann,John Tran,Wei Liu,Fung Xie,Michael Lightstone,Humphrey Shi

Agentic Variation Operators (AVO) are a new family of evolutionary variation operators that replace the fixed mutation, crossover, and hand-designed heuristics of classical evolutionary search with autonomous coding agents. Rather than confining a language model to candidate generation within a prescribed pipeline, AVO instantiates variation as a self-directed agent loop that can consult the current lineage, a domain-specific knowledge base, and execution feedback to propose, repair, critique, and verify implementation edits. We evaluate AVO on attention, among the most aggressively optimized kernel targets in AI, on NVIDIA Blackwell (B200) GPUs. Over 7 days of continuous autonomous evolution on multi-head attention, AVO discovers kernels that outperform cuDNN by up to 3.5% and FlashAttention-4 by up to 10.5% across the evaluated configurations. The discovered optimizations transfer readily to grouped-query attention, requiring only 30 minutes of additional autonomous adaptation and yielding gains of up to 7.0% over cuDNN and 9.3% over FlashAttention-4. Together, these results show that agentic variation operators move beyond prior LLM-in-the-loop evolutionary pipelines by elevating the agent from candidate generator to variation operator, and can discover performance-critical micro-architectural optimizations that produce kernels surpassing state-of-the-art expert-engineered attention implementations on today's most advanced GPU hardware.

翻译：智能体变异算子（Agentic Variation Operators, AVO）是一类新型进化变异算子，以自主编码智能体取代经典进化搜索中固定的变异、交叉及人工设计的启发式规则。AVO不将语言模型局限在预设流水线内的候选生成环节，而是将变异实现为一个自主智能体循环——该循环可参考当前演化谱系、领域知识库及执行反馈，对实施编辑进行提议、修复、批评与验证。我们在当代人工智能中优化最激进的核目标——注意力机制上，于NVIDIA Blackwell（B200）GPU对AVO进行评估。在多头注意力上持续7天自主进化后，AVO发现的核在评估配置下的性能超越cuDNN达3.5%、超越FlashAttention-4达10.5%。这些发现优化可便捷迁移至分组查询注意力，仅需额外30分钟自主适配，即可在cuDNN基础上实现最高7.0%的性能提升，在FlashAttention-4基础上实现9.3%的提升。综合上述结果，AVO通过将智能体从候选生成器提升为变异算子，超越了以往基于大语言模型的进化流水线，能够发现对性能至关重要的微架构优化，从而在当前最先进的GPU硬件上产生超越专家精心设计注意力实现的核。