Effective code optimization in compilers plays a central role in computer and software engineering. While compilers can be made to automatically search the optimization space without the need for user interventions, this is not a standard practice since the search is slow and cumbersome. Here we present CodeZero, an artificial intelligence agent trained extensively on large data to produce effective optimization strategies instantly for each program in a single trial of the agent. To overcome the huge range of possible test programs, we prepare a large dataset of training programs that emphasize quality, naturalness, and diversity. To tackle the vast space of possible optimizations, we adapt deep reinforcement learning to train the agent in a sample-efficient manner through interacting with a world model of the compiler environment. Evaluation on both benchmark suites and production-level code optimization problems demonstrates our agent's supercompiler performances and zero-shot generalization abilities, outperforming built-in optimization options designed by compiler experts. Our methodology kindles the great potential of artificial intelligence for engineering and paves the way for scaling machine learning techniques in the realm of code optimization.
翻译:编译器中的有效代码优化在计算机和软件工程中扮演着核心角色。尽管编译器可以自动搜索优化空间而无需用户干预,但由于搜索过程缓慢且繁琐,这并非标准做法。本文提出CodeZero——一种经过大规模数据训练的AI智能体,能够在对每个程序的一次性试验中即时生成有效优化策略。为应对海量可能的测试程序,我们构建了强调质量、自然性和多样性的训练程序大型数据集。为处理庞大的优化空间,我们通过深度强化学习与编译器环境的世界模型交互,以样本高效方式训练该智能体。在基准测试套件和生产级代码优化问题上的评估表明,我们的智能体展现出超编译器的性能与零样本泛化能力,超越了编译器专家设计的内置优化选项。本方法激发了人工智能在工程领域的巨大潜力,并为机器学习技术在代码优化领域的大规模应用铺平了道路。