A Transformer-Based Approach for Smart Invocation of Automatic Code Completion

Transformer-based language models are highly effective for code completion, with much research dedicated to enhancing the content of these completions. Despite their effectiveness, these models come with high operational costs and can be intrusive, especially when they suggest too often and interrupt developers who are concentrating on their work. Current research largely overlooks how these models interact with developers in practice and neglects to address when a developer should receive completion suggestions. To tackle this issue, we developed a machine learning model that can accurately predict when to invoke a code completion tool given the code context and available telemetry data. To do so, we collect a dataset of 200k developer interactions with our cross-IDE code completion plugin and train several invocation filtering models. Our results indicate that our small-scale transformer model significantly outperforms the baseline while maintaining low enough latency. We further explore the search space for integrating additional telemetry data into a pre-trained transformer directly and obtain promising results. To further demonstrate our approach's practical potential, we deployed the model in an online environment with 34 developers and provided real-world insights based on 74k actual invocations.

翻译：基于Transformer的语言模型在代码补全方面非常有效，大量研究致力于提升这些补全内容的质量。尽管这些模型效果显著，但其运行成本高昂且可能具有侵入性，尤其是在过于频繁地提供建议、打断专注于工作的开发者时。当前研究大多忽视了这些模型在实践中如何与开发者互动，也未能解决开发者应在何时接收补全建议的问题。为解决这一问题，我们开发了一种机器学习模型，该模型能够根据代码上下文和可用的遥测数据，准确预测何时调用代码补全工具。为此，我们收集了一个包含20万次开发者与跨IDE代码补全插件交互的数据集，并训练了多个调用过滤模型。结果表明，我们的小规模Transformer模型在保持足够低延迟的同时，显著优于基线模型。我们进一步探索了将额外遥测数据直接集成到预训练Transformer中的搜索空间，并获得了有希望的结果。为了进一步展示我们方法的实际潜力，我们将该模型部署在一个包含34名开发者的在线环境中，并基于7.4万次实际调用提供了真实场景的洞察。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/