Code optimization is a daunting task that requires a significant level of expertise from experienced programmers. This level of expertise is not sufficient when compared to the rapid development of new hardware architectures. Towards advancing the whole code optimization process, recent approaches rely on machine learning and artificial intelligence techniques. This paper introduces a new framework to decrease the complexity of code optimization. The proposed framework builds on large language models (LLMs) and reinforcement learning (RL) and enables LLMs to receive feedback from their environment (i.e., unit tests) during the fine-tuning process. We compare our framework with existing state-of-the-art models and show that it is more efficient with respect to speed and computational usage, as a result of the decrement in training steps and its applicability to models with fewer parameters. Additionally, our framework reduces the possibility of logical and syntactical errors. Toward evaluating our approach, we run several experiments on the PIE dataset using a CodeT5 language model and RRHF, a new reinforcement learning algorithm. We adopt a variety of evaluation metrics with regards to optimization quality, and speedup. The evaluation results demonstrate that the proposed framework has similar results in comparison with existing models using shorter training times and smaller pre-trained models. In particular, we accomplish an increase of 5.6% and 2.2 over the baseline models concerning the %OP T and SP metrics.
翻译:代码优化是一项艰巨任务,需要经验丰富的程序员具备极高水平的专业知识。然而,相较于新型硬件架构的快速发展,这种专业水平仍显不足。为推进整个代码优化流程,近期方法多依赖机器学习与人工智能技术。本文提出一种新框架以降低代码优化的复杂性。该框架基于大型语言模型(LLMs)和强化学习(RL),使LLMs能够在微调过程中接收来自环境(即单元测试)的反馈。我们将该框架与现有最先进模型进行比较,结果表明:由于训练步骤的减少及其在参数更少模型上的适用性,该框架在运行速度和计算资源消耗方面更具优势。此外,该框架还能降低逻辑错误与语法错误的可能性。为评估该方法,我们使用CodeT5语言模型和新型强化学习算法RRHF,在PIE数据集上进行了多项实验。我们采用多种评估指标衡量优化质量与加速效果。评估结果表明,所提框架在缩短训练时间、使用更小型预训练模型的情况下,与现有模型取得了相似结果。具体而言,相较于基线模型,我们在%OP T和SP指标上分别实现了5.6%和2.2的提升。