Currently, large pre-trained language models are widely applied in neural code completion systems. Though large code models significantly outperform their smaller counterparts, around 70% displayed code completions from Copilot are not accepted by developers. Being reviewed but not accepted, their help to developer productivity is considerably limited. Even worse, considering the high cost of the large code models, it is a huge waste of computing resources and energy. To fill this significant gap, we first investigate the prompts of unhelpful code completions, and empirically find four observable patterns that cause such prompts, all of which are inherent, namely, they can hardly be addressed by improving the accuracy of the model. This demonstrates the feasibility of identifying such prompts based on the prompts themselves. Motivated by this finding, we propose an early-rejection mechanism to turn down low-return prompts by foretelling the code completion qualities without sending them to the code completion system. Furthermore, we propose a lightweight Transformer-based estimator to demonstrate the feasibility of the mechanism. The experimental results show that the proposed estimator helps save 23.3% of computational cost measured in floating-point operations for the code completion systems, and 80.2% of rejected prompts lead to unhelpful completion
翻译:当前,大型预训练语言模型被广泛用于神经代码补全系统中。尽管大型代码模型在性能上显著优于其小型版本,但Copilot展示的代码补全中约有70%未被开发者采纳。这些补全虽经审查而未采纳,对开发者生产力的帮助十分有限。更糟糕的是,考虑到大型代码模型的高昂成本,这造成了巨大的计算资源和能源浪费。为填补这一显著空白,我们首先研究了无益代码补全的提示,并通过实验发现导致此类提示的四种可观察模式,这些模式均为固有特性,即它们几乎无法通过提高模型精度来解决。这证明了基于提示本身识别此类提示的可行性。受此发现启发,我们提出了一种早期拒绝机制,通过预先预测代码补全质量而不将其发送至代码补全系统,从而拒绝低回报提示。此外,我们提出一种轻量级基于Transformer的估计器,以证明该机制的可行性。实验结果表明,所提估计器帮助代码补全系统节省了23.3%的浮点运算量计算成本,且80.2%被拒绝的提示导致了无益的补全。