Deep learning has been widely adopted to tackle various code-based tasks by building deep code models based on a large amount of code snippets. While these deep code models have achieved great success, even state-of-the-art models suffer from noise present in inputs leading to erroneous predictions. While it is possible to enhance models through retraining/fine-tuning, this is not a once-and-for-all approach and incurs significant overhead. In particular, these techniques cannot on-the-fly improve performance of (deployed) models. There are currently some techniques for input denoising in other domains (such as image processing), but since code input is discrete and must strictly abide by complex syntactic and semantic constraints, input denoising techniques in other fields are almost not applicable. In this work, we propose the first input denoising technique (i.e., CodeDenoise) for deep code models. Its key idea is to localize noisy identifiers in (likely) mispredicted inputs, and denoise such inputs by cleansing the located identifiers. It does not need to retrain or reconstruct the model, but only needs to cleanse inputs on-the-fly to improve performance. Our experiments on 18 deep code models (i.e., three pre-trained models with six code-based datasets) demonstrate the effectiveness and efficiency of CodeDenoise. For example, on average, CodeDenoise successfully denoises 21.91% of mispredicted inputs and improves the original models by 2.04% in terms of the model accuracy across all the subjects in an average of 0.48 second spent on each input, substantially outperforming the widely-used fine-tuning strategy.
翻译:深度学习已广泛应用于基于代码的任务,通过大量代码片段构建深度代码模型。尽管这些深度代码模型取得了巨大成功,但即使是最先进的模型也面临输入噪声导致的错误预测问题。虽然可以通过重训练/微调来增强模型,但这并非一劳永逸的方法,且会带来显著开销。特别是,这些技术无法即时提升(已部署)模型的性能。目前其他领域(如图像处理)存在一些输入去噪技术,但由于代码输入具有离散性且必须严格遵循复杂的语法和语义约束,这些领域的输入去噪技术几乎无法适用。本文首次提出针对深度代码模型的输入去噪技术(即CodeDenoise)。其核心思想是定位可能被错误预测输入中的噪声标识符,并通过清除这些标识符来实现去噪。该方法无需重训练或重构模型,只需即时清理输入即可提升性能。我们在18个深度代码模型(即三个预训练模型与六个基于代码的数据集)上的实验证明了CodeDenoise的有效性和效率。例如,平均而言,CodeDenoise成功去除了21.91%的错误预测输入噪声,使原始模型在所有实验对象上的准确率提升2.04%,且每个输入仅需0.48秒处理时间,显著优于广泛使用的微调策略。