Transformer-based language models are highly effective for code completion, with much research dedicated to enhancing the content of these completions. Despite their effectiveness, these models come with high operational costs and can be intrusive, especially when they suggest too often and interrupt developers who are concentrating on their work. Current research largely overlooks how these models interact with developers in practice and neglects to address when a developer should receive completion suggestions. To tackle this issue, we developed a machine learning model that can accurately predict when to invoke a code completion tool given the code context and available telemetry data. To do so, we collect a dataset of 200k developer interactions with our cross-IDE code completion plugin and train several invocation filtering models. Our results indicate that our small-scale transformer model significantly outperforms the baseline while maintaining low enough latency. We further explore the search space for integrating additional telemetry data into a pre-trained transformer directly and obtain promising results. To further demonstrate our approach's practical potential, we deployed the model in an online environment with 34 developers and provided real-world insights based on 74k actual invocations.
翻译:基于Transformer的语言模型在代码补全方面非常有效,大量研究致力于提升这些补全内容的质量。尽管这些模型效果显著,但其运行成本高昂且可能具有侵入性,尤其是在过于频繁地提供建议、打断专注于工作的开发者时。当前研究大多忽视了这些模型在实践中如何与开发者互动,也未能解决开发者应在何时接收补全建议的问题。为解决这一问题,我们开发了一种机器学习模型,该模型能够根据代码上下文和可用的遥测数据,准确预测何时调用代码补全工具。为此,我们收集了一个包含20万次开发者与跨IDE代码补全插件交互的数据集,并训练了多个调用过滤模型。结果表明,我们的小规模Transformer模型在保持足够低延迟的同时,显著优于基线模型。我们进一步探索了将额外遥测数据直接集成到预训练Transformer中的搜索空间,并获得了有希望的结果。为了进一步展示我们方法的实际潜力,我们将该模型部署在一个包含34名开发者的在线环境中,并基于7.4万次实际调用提供了真实场景的洞察。